Python JIT container type

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite












This is a follow on from this review. Where I am attempting to improve the performance of my rest client.



I have created an container type (as suggested in my previous post) that lazily instantiates object instances from a JSON Array. (Actually a python list of dict's)



The principle behind the class is to store the raw data in the instance.__dict__ as instance._attribute. When the class' __getattribute__ fails __getattr__ is called which replaces instance._attribute with instance.attribute and returns the corresponding item



The __init__ method creates _attribute's by enumerating the supplied *items



I'm able to simulate a sequence container by overwriting __getitem__ which turns the index into a getattr call. (It also works with slice's)



I have purposely left of __reversed__ because I believe python will automatically use reversed(range(len(instance))) to generate reversed index's



I have also left off __bool__ as __len__ is defined



Methods get_id, get_instrument, get_instruments are domain specific to my application.



One caveat is that a helper function create_attribute must be defined. Which is the function that will 'expand' the data into instances



EDIT



I forgot to mention that the class is meant to be immutable



The code:



class Array(object):
"""Mixin to denote objects that are sent from OANDA in an array.
Also used to correctly serialize objects.
"""

def __init_subclass__(cls, **kwargs):
# Denotes the type the Array contains
cls._contains = kwargs.pop('contains')
# True get_instrument/s() returns an Array of items. False returns single item
cls._one_to_many = kwargs.pop('one_to_many', True)

def __init__(self, *items):
for index, item in enumerate(items):
object.__setattr__(self, f'_index', item)

def __getattr__(self, item):
result = create_attribute(self._contains, self.__getattribute__('_' + item))
object.__setattr__(self, item, result)
object.__delattr__(self, '_' + item)
return result

def __len__(self):
return len(self.__dict__)

def __iter__(self):
def iterator():
for index in range(len(self)):
try:
yield getattr(self, str(index))
except AttributeError:
raise StopIteration

return iterator()

def __add__(self, other):
return self.__class__(*self.__dict__.values(), *other)

__radd__ = __add__

def __getitem__(self, item):
if isinstance(item, slice):
return self.__class__(*[self[index] for index in range(len(self))[item]])
return getattr(self, str(item))

def __delattr__(self, item):
raise NotImplementedError

def __setattr__(self, key, value):
raise NotImplementedError

def get_id(self, id_, default=None):
try:
for value in self:
if value.id == id_:
return value
except AttributeError:
pass
return default

def get_instruments(self, instrument, default=None):
# ArrayPosition can only have a One to One relationship between an instrument
# and a Position. Though ArrayTrades and others can have a Many to One relationship
try:
matches = self.__class__(*[value for value in self if value.instrument == instrument])
if matches:
return matches
except AttributeError:
pass
return default

def get_instrument(self, instrument, default=None):
try:
for value in self:
try:
if value.instrument == instrument:
return value
except AttributeError:
if value.name == instrument:
return value
except AttributeError:
pass
return default

def dataframe(self, json=False, datetime_format=None):
"""Create a pandas.Dataframe"""
return pd.DataFrame(obj.data(json=json, datetime_format=datetime_format) for obj in self)


Console Example:



>>> class LazyLists(Array, contains=list):
... pass
...
>>> # must define create_attribute
>>> def create_attribute(typ, data):
... return typ(data)
...
>>> lazy_lists = LazyLists(*[range(10) for _ in range(2)])
>>> lazy_lists
<LazyLists object at 0x000002202BE335F8>
>>> len(lazy_lists)
2
>>> lazy_lists.__dict__
'_0': range(0, 10), '_1': range(0, 10)
>>> lazy_lists[1]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> lazy_lists.__dict__
'_0': range(0, 10), '1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> for i in lazy_lists: print(i)
...
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> lazy_lists.__dict__
'1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], '0': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


I wrote a benchmark to asses if this was worth the effort.



Before previous post
enter image description here



Before Lazy Array class
enter image description here



After Lazy Array class
enter image description here



Entire implementation can be found here. I am interested in what you think about the Array class. How would you have done it better?







share|improve this question



























    up vote
    3
    down vote

    favorite












    This is a follow on from this review. Where I am attempting to improve the performance of my rest client.



    I have created an container type (as suggested in my previous post) that lazily instantiates object instances from a JSON Array. (Actually a python list of dict's)



    The principle behind the class is to store the raw data in the instance.__dict__ as instance._attribute. When the class' __getattribute__ fails __getattr__ is called which replaces instance._attribute with instance.attribute and returns the corresponding item



    The __init__ method creates _attribute's by enumerating the supplied *items



    I'm able to simulate a sequence container by overwriting __getitem__ which turns the index into a getattr call. (It also works with slice's)



    I have purposely left of __reversed__ because I believe python will automatically use reversed(range(len(instance))) to generate reversed index's



    I have also left off __bool__ as __len__ is defined



    Methods get_id, get_instrument, get_instruments are domain specific to my application.



    One caveat is that a helper function create_attribute must be defined. Which is the function that will 'expand' the data into instances



    EDIT



    I forgot to mention that the class is meant to be immutable



    The code:



    class Array(object):
    """Mixin to denote objects that are sent from OANDA in an array.
    Also used to correctly serialize objects.
    """

    def __init_subclass__(cls, **kwargs):
    # Denotes the type the Array contains
    cls._contains = kwargs.pop('contains')
    # True get_instrument/s() returns an Array of items. False returns single item
    cls._one_to_many = kwargs.pop('one_to_many', True)

    def __init__(self, *items):
    for index, item in enumerate(items):
    object.__setattr__(self, f'_index', item)

    def __getattr__(self, item):
    result = create_attribute(self._contains, self.__getattribute__('_' + item))
    object.__setattr__(self, item, result)
    object.__delattr__(self, '_' + item)
    return result

    def __len__(self):
    return len(self.__dict__)

    def __iter__(self):
    def iterator():
    for index in range(len(self)):
    try:
    yield getattr(self, str(index))
    except AttributeError:
    raise StopIteration

    return iterator()

    def __add__(self, other):
    return self.__class__(*self.__dict__.values(), *other)

    __radd__ = __add__

    def __getitem__(self, item):
    if isinstance(item, slice):
    return self.__class__(*[self[index] for index in range(len(self))[item]])
    return getattr(self, str(item))

    def __delattr__(self, item):
    raise NotImplementedError

    def __setattr__(self, key, value):
    raise NotImplementedError

    def get_id(self, id_, default=None):
    try:
    for value in self:
    if value.id == id_:
    return value
    except AttributeError:
    pass
    return default

    def get_instruments(self, instrument, default=None):
    # ArrayPosition can only have a One to One relationship between an instrument
    # and a Position. Though ArrayTrades and others can have a Many to One relationship
    try:
    matches = self.__class__(*[value for value in self if value.instrument == instrument])
    if matches:
    return matches
    except AttributeError:
    pass
    return default

    def get_instrument(self, instrument, default=None):
    try:
    for value in self:
    try:
    if value.instrument == instrument:
    return value
    except AttributeError:
    if value.name == instrument:
    return value
    except AttributeError:
    pass
    return default

    def dataframe(self, json=False, datetime_format=None):
    """Create a pandas.Dataframe"""
    return pd.DataFrame(obj.data(json=json, datetime_format=datetime_format) for obj in self)


    Console Example:



    >>> class LazyLists(Array, contains=list):
    ... pass
    ...
    >>> # must define create_attribute
    >>> def create_attribute(typ, data):
    ... return typ(data)
    ...
    >>> lazy_lists = LazyLists(*[range(10) for _ in range(2)])
    >>> lazy_lists
    <LazyLists object at 0x000002202BE335F8>
    >>> len(lazy_lists)
    2
    >>> lazy_lists.__dict__
    '_0': range(0, 10), '_1': range(0, 10)
    >>> lazy_lists[1]
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> lazy_lists.__dict__
    '_0': range(0, 10), '1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> for i in lazy_lists: print(i)
    ...
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
    >>> lazy_lists.__dict__
    '1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], '0': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


    I wrote a benchmark to asses if this was worth the effort.



    Before previous post
    enter image description here



    Before Lazy Array class
    enter image description here



    After Lazy Array class
    enter image description here



    Entire implementation can be found here. I am interested in what you think about the Array class. How would you have done it better?







    share|improve this question























      up vote
      3
      down vote

      favorite









      up vote
      3
      down vote

      favorite











      This is a follow on from this review. Where I am attempting to improve the performance of my rest client.



      I have created an container type (as suggested in my previous post) that lazily instantiates object instances from a JSON Array. (Actually a python list of dict's)



      The principle behind the class is to store the raw data in the instance.__dict__ as instance._attribute. When the class' __getattribute__ fails __getattr__ is called which replaces instance._attribute with instance.attribute and returns the corresponding item



      The __init__ method creates _attribute's by enumerating the supplied *items



      I'm able to simulate a sequence container by overwriting __getitem__ which turns the index into a getattr call. (It also works with slice's)



      I have purposely left of __reversed__ because I believe python will automatically use reversed(range(len(instance))) to generate reversed index's



      I have also left off __bool__ as __len__ is defined



      Methods get_id, get_instrument, get_instruments are domain specific to my application.



      One caveat is that a helper function create_attribute must be defined. Which is the function that will 'expand' the data into instances



      EDIT



      I forgot to mention that the class is meant to be immutable



      The code:



      class Array(object):
      """Mixin to denote objects that are sent from OANDA in an array.
      Also used to correctly serialize objects.
      """

      def __init_subclass__(cls, **kwargs):
      # Denotes the type the Array contains
      cls._contains = kwargs.pop('contains')
      # True get_instrument/s() returns an Array of items. False returns single item
      cls._one_to_many = kwargs.pop('one_to_many', True)

      def __init__(self, *items):
      for index, item in enumerate(items):
      object.__setattr__(self, f'_index', item)

      def __getattr__(self, item):
      result = create_attribute(self._contains, self.__getattribute__('_' + item))
      object.__setattr__(self, item, result)
      object.__delattr__(self, '_' + item)
      return result

      def __len__(self):
      return len(self.__dict__)

      def __iter__(self):
      def iterator():
      for index in range(len(self)):
      try:
      yield getattr(self, str(index))
      except AttributeError:
      raise StopIteration

      return iterator()

      def __add__(self, other):
      return self.__class__(*self.__dict__.values(), *other)

      __radd__ = __add__

      def __getitem__(self, item):
      if isinstance(item, slice):
      return self.__class__(*[self[index] for index in range(len(self))[item]])
      return getattr(self, str(item))

      def __delattr__(self, item):
      raise NotImplementedError

      def __setattr__(self, key, value):
      raise NotImplementedError

      def get_id(self, id_, default=None):
      try:
      for value in self:
      if value.id == id_:
      return value
      except AttributeError:
      pass
      return default

      def get_instruments(self, instrument, default=None):
      # ArrayPosition can only have a One to One relationship between an instrument
      # and a Position. Though ArrayTrades and others can have a Many to One relationship
      try:
      matches = self.__class__(*[value for value in self if value.instrument == instrument])
      if matches:
      return matches
      except AttributeError:
      pass
      return default

      def get_instrument(self, instrument, default=None):
      try:
      for value in self:
      try:
      if value.instrument == instrument:
      return value
      except AttributeError:
      if value.name == instrument:
      return value
      except AttributeError:
      pass
      return default

      def dataframe(self, json=False, datetime_format=None):
      """Create a pandas.Dataframe"""
      return pd.DataFrame(obj.data(json=json, datetime_format=datetime_format) for obj in self)


      Console Example:



      >>> class LazyLists(Array, contains=list):
      ... pass
      ...
      >>> # must define create_attribute
      >>> def create_attribute(typ, data):
      ... return typ(data)
      ...
      >>> lazy_lists = LazyLists(*[range(10) for _ in range(2)])
      >>> lazy_lists
      <LazyLists object at 0x000002202BE335F8>
      >>> len(lazy_lists)
      2
      >>> lazy_lists.__dict__
      '_0': range(0, 10), '_1': range(0, 10)
      >>> lazy_lists[1]
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      >>> lazy_lists.__dict__
      '_0': range(0, 10), '1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      >>> for i in lazy_lists: print(i)
      ...
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      >>> lazy_lists.__dict__
      '1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], '0': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


      I wrote a benchmark to asses if this was worth the effort.



      Before previous post
      enter image description here



      Before Lazy Array class
      enter image description here



      After Lazy Array class
      enter image description here



      Entire implementation can be found here. I am interested in what you think about the Array class. How would you have done it better?







      share|improve this question













      This is a follow on from this review. Where I am attempting to improve the performance of my rest client.



      I have created an container type (as suggested in my previous post) that lazily instantiates object instances from a JSON Array. (Actually a python list of dict's)



      The principle behind the class is to store the raw data in the instance.__dict__ as instance._attribute. When the class' __getattribute__ fails __getattr__ is called which replaces instance._attribute with instance.attribute and returns the corresponding item



      The __init__ method creates _attribute's by enumerating the supplied *items



      I'm able to simulate a sequence container by overwriting __getitem__ which turns the index into a getattr call. (It also works with slice's)



      I have purposely left of __reversed__ because I believe python will automatically use reversed(range(len(instance))) to generate reversed index's



      I have also left off __bool__ as __len__ is defined



      Methods get_id, get_instrument, get_instruments are domain specific to my application.



      One caveat is that a helper function create_attribute must be defined. Which is the function that will 'expand' the data into instances



      EDIT



      I forgot to mention that the class is meant to be immutable



      The code:



      class Array(object):
      """Mixin to denote objects that are sent from OANDA in an array.
      Also used to correctly serialize objects.
      """

      def __init_subclass__(cls, **kwargs):
      # Denotes the type the Array contains
      cls._contains = kwargs.pop('contains')
      # True get_instrument/s() returns an Array of items. False returns single item
      cls._one_to_many = kwargs.pop('one_to_many', True)

      def __init__(self, *items):
      for index, item in enumerate(items):
      object.__setattr__(self, f'_index', item)

      def __getattr__(self, item):
      result = create_attribute(self._contains, self.__getattribute__('_' + item))
      object.__setattr__(self, item, result)
      object.__delattr__(self, '_' + item)
      return result

      def __len__(self):
      return len(self.__dict__)

      def __iter__(self):
      def iterator():
      for index in range(len(self)):
      try:
      yield getattr(self, str(index))
      except AttributeError:
      raise StopIteration

      return iterator()

      def __add__(self, other):
      return self.__class__(*self.__dict__.values(), *other)

      __radd__ = __add__

      def __getitem__(self, item):
      if isinstance(item, slice):
      return self.__class__(*[self[index] for index in range(len(self))[item]])
      return getattr(self, str(item))

      def __delattr__(self, item):
      raise NotImplementedError

      def __setattr__(self, key, value):
      raise NotImplementedError

      def get_id(self, id_, default=None):
      try:
      for value in self:
      if value.id == id_:
      return value
      except AttributeError:
      pass
      return default

      def get_instruments(self, instrument, default=None):
      # ArrayPosition can only have a One to One relationship between an instrument
      # and a Position. Though ArrayTrades and others can have a Many to One relationship
      try:
      matches = self.__class__(*[value for value in self if value.instrument == instrument])
      if matches:
      return matches
      except AttributeError:
      pass
      return default

      def get_instrument(self, instrument, default=None):
      try:
      for value in self:
      try:
      if value.instrument == instrument:
      return value
      except AttributeError:
      if value.name == instrument:
      return value
      except AttributeError:
      pass
      return default

      def dataframe(self, json=False, datetime_format=None):
      """Create a pandas.Dataframe"""
      return pd.DataFrame(obj.data(json=json, datetime_format=datetime_format) for obj in self)


      Console Example:



      >>> class LazyLists(Array, contains=list):
      ... pass
      ...
      >>> # must define create_attribute
      >>> def create_attribute(typ, data):
      ... return typ(data)
      ...
      >>> lazy_lists = LazyLists(*[range(10) for _ in range(2)])
      >>> lazy_lists
      <LazyLists object at 0x000002202BE335F8>
      >>> len(lazy_lists)
      2
      >>> lazy_lists.__dict__
      '_0': range(0, 10), '_1': range(0, 10)
      >>> lazy_lists[1]
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      >>> lazy_lists.__dict__
      '_0': range(0, 10), '1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      >>> for i in lazy_lists: print(i)
      ...
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
      >>> lazy_lists.__dict__
      '1': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], '0': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]


      I wrote a benchmark to asses if this was worth the effort.



      Before previous post
      enter image description here



      Before Lazy Array class
      enter image description here



      After Lazy Array class
      enter image description here



      Entire implementation can be found here. I am interested in what you think about the Array class. How would you have done it better?









      share|improve this question












      share|improve this question




      share|improve this question








      edited Jan 8 at 12:36
























      asked Jan 8 at 11:36









      James Schinner

      422113




      422113




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          The implementation of your Array class still bugs me. So I cloned your repo and renamed Array -> OldArray, then created class NewArray:



          class NewArray(object):
          """Mixin to denote objects that are sent from OANDA in an array.
          Also used to correctly serialize objects.
          """

          def __contains__(self, item):
          """Return True if item in this array, False otherwise.

          Note: this traverses all or part of the array, instantiating the
          objects. Using `x in array` may, therefore, have a serious impact on
          performance.

          """
          for value in self:
          if value == item:
          return True

          def __init__(self, *items):
          """Initialize a new array.

          The *items passed in are assumed to be JSON data. If an item is
          accessed, it is passed to `create_attribute` with the appropriate
          class type.

          Initially, objects are stored in self._items. When accessed, the
          objects are reified and stored in self.items. This is transparently
          handled by self.__getitem__(self, key).

          """
          print(f"NewArray<self._contains> with len len(items)")
          self._items = items
          self.items =

          def __init_subclass__(cls, **kwargs):
          """Record the type *contained in* the subclass-array.

          A subclass like:

          class array_holding_foo(Array, contains=Foo):
          pass

          will have all its inner objects instantiated using class Foo.

          """
          cls._contains = kwargs.pop('contains')

          def __len__(self):
          return len(self._items)

          def __iter__(self):
          """Iterate over items in array. Use integer indexing so that
          __getitem__ can handle reifying all the objects.

          """
          for index in range(len(self)):
          yield self[index]

          def __getitem__(self, key):
          print(f"getitem[key] called on NewArray<self._contains>")
          if isinstance(key, slice):
          length = len(self.items)
          start = (0 if key.start is None
          else key.start if key.start >= 0
          else key.start + length)
          stop = (length if key.stop is None
          else key.stop if key.stop >= 0
          else key.stop + length)
          step = (1 if key.step is None
          else key.step)

          # Note: this reifies the items before putting them in the
          # new object.
          return self.__class__(*[self[index]
          for index in range(start, stop, step)])

          length = len(self._items)
          if key < 0:
          key += length

          if not (0 <= key < length):
          raise IndexError('Array index out of range')

          if key >= len(self.items):
          self.items += [None] * (key - len(self.items) + 1)

          if self.items[key] is None:
          json = self._items[key]
          self.items[key] = create_attribute(self._contains, json)

          return self.items[key]


          I know that this code does less work when an array is created: it creates and sets two attributes in __init__ but doesn't loop at all. I had an older version that set the .items list to [None] * len(items), which made for less work in the __getitem__ method, but it still "looped" N times, so I tried squeezing that out!



          But your benchmark using this code averaged a few hundredths of a second slower than the benchmark using your old Array implementation.



          I think that means that the limit of performance has been reached. I ran the client_benchmark script 10 times, sorted the times reported, dropped the max and min, and averaged the 8 remaining (for both OldArray and NewArray versions).




          Old array: avg time: 6.518096446990967



          New array: avg time: 6.587247729301453




          My take-away is that your code- as written when you posted this- is close enough to "doing nothing" in performance that tweaking it just produces noise on this benchmark.



          Sooooo.... you need another benchmark! Possibly several benchmarks.



          You should save this as the "creating arrays" benchmark, and add it to your perftest directory (which you don't have... yet).



          Then maybe create some other benchmarks, reflective of actual use cases, which we can use to hammer out the performance of objects when the arrays are actually accessed, instead of just creating them.



          Edit:



          Also, if slicing is actually used it probably deserves better treatment. There should be a way of copying the json and actual versions in the initializer.






          share|improve this answer





















          • That is just brilliant! You __getitem__ method is something. Thanks for taking the time to run the benchmark! Also I realize how lazy I have been with my doc strings compared to your code. This is going straight into master! I will create a perftest directory and more relevant benchmarks
            – James Schinner
            Jan 9 at 2:04










          • Great, I merged your code. Added a __repr__ and re-implemented the get_id, get_instrument methods to suite. I also implemented the slicing as return self.__class__(*[self[index] for index in range(len(self._items))[slice]]) I found here: stackoverflow.com/questions/13855288/… As a side note, I hadn't though of equality testing for instances until you added the __contains__ method. I wonder if I can check equality with out instantiating all attributes. Hmm...
            – James Schinner
            Jan 9 at 6:47











          • I was wondering if you could do get_id without having to reify the objects. Just check the json dicts.
            – Austin Hastings
            Jan 9 at 7:29










          • I implemented that great idea and updated my tests. Back to %100 coverage. Regarding the equality testing, I ended up add storing the string f"self.__class__.__name__(**(kwargs))" in Model's init to be used by __eq__ which meant I didn't have to completely reify the instance. However, that then got me thinking about the __hash__ method. async_v20's features immutable objects, so as a user I would expect hash(Model(1)) == hash(Model(1)) == True. I used the same string for the equality test for the hash. I don't know if this is a good idea? Thanks so much for your work !
            – James Schinner
            Jan 10 at 13:05











          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184573%2fpython-jit-container-type%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote



          accepted










          The implementation of your Array class still bugs me. So I cloned your repo and renamed Array -> OldArray, then created class NewArray:



          class NewArray(object):
          """Mixin to denote objects that are sent from OANDA in an array.
          Also used to correctly serialize objects.
          """

          def __contains__(self, item):
          """Return True if item in this array, False otherwise.

          Note: this traverses all or part of the array, instantiating the
          objects. Using `x in array` may, therefore, have a serious impact on
          performance.

          """
          for value in self:
          if value == item:
          return True

          def __init__(self, *items):
          """Initialize a new array.

          The *items passed in are assumed to be JSON data. If an item is
          accessed, it is passed to `create_attribute` with the appropriate
          class type.

          Initially, objects are stored in self._items. When accessed, the
          objects are reified and stored in self.items. This is transparently
          handled by self.__getitem__(self, key).

          """
          print(f"NewArray<self._contains> with len len(items)")
          self._items = items
          self.items =

          def __init_subclass__(cls, **kwargs):
          """Record the type *contained in* the subclass-array.

          A subclass like:

          class array_holding_foo(Array, contains=Foo):
          pass

          will have all its inner objects instantiated using class Foo.

          """
          cls._contains = kwargs.pop('contains')

          def __len__(self):
          return len(self._items)

          def __iter__(self):
          """Iterate over items in array. Use integer indexing so that
          __getitem__ can handle reifying all the objects.

          """
          for index in range(len(self)):
          yield self[index]

          def __getitem__(self, key):
          print(f"getitem[key] called on NewArray<self._contains>")
          if isinstance(key, slice):
          length = len(self.items)
          start = (0 if key.start is None
          else key.start if key.start >= 0
          else key.start + length)
          stop = (length if key.stop is None
          else key.stop if key.stop >= 0
          else key.stop + length)
          step = (1 if key.step is None
          else key.step)

          # Note: this reifies the items before putting them in the
          # new object.
          return self.__class__(*[self[index]
          for index in range(start, stop, step)])

          length = len(self._items)
          if key < 0:
          key += length

          if not (0 <= key < length):
          raise IndexError('Array index out of range')

          if key >= len(self.items):
          self.items += [None] * (key - len(self.items) + 1)

          if self.items[key] is None:
          json = self._items[key]
          self.items[key] = create_attribute(self._contains, json)

          return self.items[key]


          I know that this code does less work when an array is created: it creates and sets two attributes in __init__ but doesn't loop at all. I had an older version that set the .items list to [None] * len(items), which made for less work in the __getitem__ method, but it still "looped" N times, so I tried squeezing that out!



          But your benchmark using this code averaged a few hundredths of a second slower than the benchmark using your old Array implementation.



          I think that means that the limit of performance has been reached. I ran the client_benchmark script 10 times, sorted the times reported, dropped the max and min, and averaged the 8 remaining (for both OldArray and NewArray versions).




          Old array: avg time: 6.518096446990967



          New array: avg time: 6.587247729301453




          My take-away is that your code- as written when you posted this- is close enough to "doing nothing" in performance that tweaking it just produces noise on this benchmark.



          Sooooo.... you need another benchmark! Possibly several benchmarks.



          You should save this as the "creating arrays" benchmark, and add it to your perftest directory (which you don't have... yet).



          Then maybe create some other benchmarks, reflective of actual use cases, which we can use to hammer out the performance of objects when the arrays are actually accessed, instead of just creating them.



          Edit:



          Also, if slicing is actually used it probably deserves better treatment. There should be a way of copying the json and actual versions in the initializer.






          share|improve this answer





















          • That is just brilliant! You __getitem__ method is something. Thanks for taking the time to run the benchmark! Also I realize how lazy I have been with my doc strings compared to your code. This is going straight into master! I will create a perftest directory and more relevant benchmarks
            – James Schinner
            Jan 9 at 2:04










          • Great, I merged your code. Added a __repr__ and re-implemented the get_id, get_instrument methods to suite. I also implemented the slicing as return self.__class__(*[self[index] for index in range(len(self._items))[slice]]) I found here: stackoverflow.com/questions/13855288/… As a side note, I hadn't though of equality testing for instances until you added the __contains__ method. I wonder if I can check equality with out instantiating all attributes. Hmm...
            – James Schinner
            Jan 9 at 6:47











          • I was wondering if you could do get_id without having to reify the objects. Just check the json dicts.
            – Austin Hastings
            Jan 9 at 7:29










          • I implemented that great idea and updated my tests. Back to %100 coverage. Regarding the equality testing, I ended up add storing the string f"self.__class__.__name__(**(kwargs))" in Model's init to be used by __eq__ which meant I didn't have to completely reify the instance. However, that then got me thinking about the __hash__ method. async_v20's features immutable objects, so as a user I would expect hash(Model(1)) == hash(Model(1)) == True. I used the same string for the equality test for the hash. I don't know if this is a good idea? Thanks so much for your work !
            – James Schinner
            Jan 10 at 13:05















          up vote
          1
          down vote



          accepted










          The implementation of your Array class still bugs me. So I cloned your repo and renamed Array -> OldArray, then created class NewArray:



          class NewArray(object):
          """Mixin to denote objects that are sent from OANDA in an array.
          Also used to correctly serialize objects.
          """

          def __contains__(self, item):
          """Return True if item in this array, False otherwise.

          Note: this traverses all or part of the array, instantiating the
          objects. Using `x in array` may, therefore, have a serious impact on
          performance.

          """
          for value in self:
          if value == item:
          return True

          def __init__(self, *items):
          """Initialize a new array.

          The *items passed in are assumed to be JSON data. If an item is
          accessed, it is passed to `create_attribute` with the appropriate
          class type.

          Initially, objects are stored in self._items. When accessed, the
          objects are reified and stored in self.items. This is transparently
          handled by self.__getitem__(self, key).

          """
          print(f"NewArray<self._contains> with len len(items)")
          self._items = items
          self.items =

          def __init_subclass__(cls, **kwargs):
          """Record the type *contained in* the subclass-array.

          A subclass like:

          class array_holding_foo(Array, contains=Foo):
          pass

          will have all its inner objects instantiated using class Foo.

          """
          cls._contains = kwargs.pop('contains')

          def __len__(self):
          return len(self._items)

          def __iter__(self):
          """Iterate over items in array. Use integer indexing so that
          __getitem__ can handle reifying all the objects.

          """
          for index in range(len(self)):
          yield self[index]

          def __getitem__(self, key):
          print(f"getitem[key] called on NewArray<self._contains>")
          if isinstance(key, slice):
          length = len(self.items)
          start = (0 if key.start is None
          else key.start if key.start >= 0
          else key.start + length)
          stop = (length if key.stop is None
          else key.stop if key.stop >= 0
          else key.stop + length)
          step = (1 if key.step is None
          else key.step)

          # Note: this reifies the items before putting them in the
          # new object.
          return self.__class__(*[self[index]
          for index in range(start, stop, step)])

          length = len(self._items)
          if key < 0:
          key += length

          if not (0 <= key < length):
          raise IndexError('Array index out of range')

          if key >= len(self.items):
          self.items += [None] * (key - len(self.items) + 1)

          if self.items[key] is None:
          json = self._items[key]
          self.items[key] = create_attribute(self._contains, json)

          return self.items[key]


          I know that this code does less work when an array is created: it creates and sets two attributes in __init__ but doesn't loop at all. I had an older version that set the .items list to [None] * len(items), which made for less work in the __getitem__ method, but it still "looped" N times, so I tried squeezing that out!



          But your benchmark using this code averaged a few hundredths of a second slower than the benchmark using your old Array implementation.



          I think that means that the limit of performance has been reached. I ran the client_benchmark script 10 times, sorted the times reported, dropped the max and min, and averaged the 8 remaining (for both OldArray and NewArray versions).




          Old array: avg time: 6.518096446990967



          New array: avg time: 6.587247729301453




          My take-away is that your code- as written when you posted this- is close enough to "doing nothing" in performance that tweaking it just produces noise on this benchmark.



          Sooooo.... you need another benchmark! Possibly several benchmarks.



          You should save this as the "creating arrays" benchmark, and add it to your perftest directory (which you don't have... yet).



          Then maybe create some other benchmarks, reflective of actual use cases, which we can use to hammer out the performance of objects when the arrays are actually accessed, instead of just creating them.



          Edit:



          Also, if slicing is actually used it probably deserves better treatment. There should be a way of copying the json and actual versions in the initializer.






          share|improve this answer





















          • That is just brilliant! You __getitem__ method is something. Thanks for taking the time to run the benchmark! Also I realize how lazy I have been with my doc strings compared to your code. This is going straight into master! I will create a perftest directory and more relevant benchmarks
            – James Schinner
            Jan 9 at 2:04










          • Great, I merged your code. Added a __repr__ and re-implemented the get_id, get_instrument methods to suite. I also implemented the slicing as return self.__class__(*[self[index] for index in range(len(self._items))[slice]]) I found here: stackoverflow.com/questions/13855288/… As a side note, I hadn't though of equality testing for instances until you added the __contains__ method. I wonder if I can check equality with out instantiating all attributes. Hmm...
            – James Schinner
            Jan 9 at 6:47











          • I was wondering if you could do get_id without having to reify the objects. Just check the json dicts.
            – Austin Hastings
            Jan 9 at 7:29










          • I implemented that great idea and updated my tests. Back to %100 coverage. Regarding the equality testing, I ended up add storing the string f"self.__class__.__name__(**(kwargs))" in Model's init to be used by __eq__ which meant I didn't have to completely reify the instance. However, that then got me thinking about the __hash__ method. async_v20's features immutable objects, so as a user I would expect hash(Model(1)) == hash(Model(1)) == True. I used the same string for the equality test for the hash. I don't know if this is a good idea? Thanks so much for your work !
            – James Schinner
            Jan 10 at 13:05













          up vote
          1
          down vote



          accepted







          up vote
          1
          down vote



          accepted






          The implementation of your Array class still bugs me. So I cloned your repo and renamed Array -> OldArray, then created class NewArray:



          class NewArray(object):
          """Mixin to denote objects that are sent from OANDA in an array.
          Also used to correctly serialize objects.
          """

          def __contains__(self, item):
          """Return True if item in this array, False otherwise.

          Note: this traverses all or part of the array, instantiating the
          objects. Using `x in array` may, therefore, have a serious impact on
          performance.

          """
          for value in self:
          if value == item:
          return True

          def __init__(self, *items):
          """Initialize a new array.

          The *items passed in are assumed to be JSON data. If an item is
          accessed, it is passed to `create_attribute` with the appropriate
          class type.

          Initially, objects are stored in self._items. When accessed, the
          objects are reified and stored in self.items. This is transparently
          handled by self.__getitem__(self, key).

          """
          print(f"NewArray<self._contains> with len len(items)")
          self._items = items
          self.items =

          def __init_subclass__(cls, **kwargs):
          """Record the type *contained in* the subclass-array.

          A subclass like:

          class array_holding_foo(Array, contains=Foo):
          pass

          will have all its inner objects instantiated using class Foo.

          """
          cls._contains = kwargs.pop('contains')

          def __len__(self):
          return len(self._items)

          def __iter__(self):
          """Iterate over items in array. Use integer indexing so that
          __getitem__ can handle reifying all the objects.

          """
          for index in range(len(self)):
          yield self[index]

          def __getitem__(self, key):
          print(f"getitem[key] called on NewArray<self._contains>")
          if isinstance(key, slice):
          length = len(self.items)
          start = (0 if key.start is None
          else key.start if key.start >= 0
          else key.start + length)
          stop = (length if key.stop is None
          else key.stop if key.stop >= 0
          else key.stop + length)
          step = (1 if key.step is None
          else key.step)

          # Note: this reifies the items before putting them in the
          # new object.
          return self.__class__(*[self[index]
          for index in range(start, stop, step)])

          length = len(self._items)
          if key < 0:
          key += length

          if not (0 <= key < length):
          raise IndexError('Array index out of range')

          if key >= len(self.items):
          self.items += [None] * (key - len(self.items) + 1)

          if self.items[key] is None:
          json = self._items[key]
          self.items[key] = create_attribute(self._contains, json)

          return self.items[key]


          I know that this code does less work when an array is created: it creates and sets two attributes in __init__ but doesn't loop at all. I had an older version that set the .items list to [None] * len(items), which made for less work in the __getitem__ method, but it still "looped" N times, so I tried squeezing that out!



          But your benchmark using this code averaged a few hundredths of a second slower than the benchmark using your old Array implementation.



          I think that means that the limit of performance has been reached. I ran the client_benchmark script 10 times, sorted the times reported, dropped the max and min, and averaged the 8 remaining (for both OldArray and NewArray versions).




          Old array: avg time: 6.518096446990967



          New array: avg time: 6.587247729301453




          My take-away is that your code- as written when you posted this- is close enough to "doing nothing" in performance that tweaking it just produces noise on this benchmark.



          Sooooo.... you need another benchmark! Possibly several benchmarks.



          You should save this as the "creating arrays" benchmark, and add it to your perftest directory (which you don't have... yet).



          Then maybe create some other benchmarks, reflective of actual use cases, which we can use to hammer out the performance of objects when the arrays are actually accessed, instead of just creating them.



          Edit:



          Also, if slicing is actually used it probably deserves better treatment. There should be a way of copying the json and actual versions in the initializer.






          share|improve this answer













          The implementation of your Array class still bugs me. So I cloned your repo and renamed Array -> OldArray, then created class NewArray:



          class NewArray(object):
          """Mixin to denote objects that are sent from OANDA in an array.
          Also used to correctly serialize objects.
          """

          def __contains__(self, item):
          """Return True if item in this array, False otherwise.

          Note: this traverses all or part of the array, instantiating the
          objects. Using `x in array` may, therefore, have a serious impact on
          performance.

          """
          for value in self:
          if value == item:
          return True

          def __init__(self, *items):
          """Initialize a new array.

          The *items passed in are assumed to be JSON data. If an item is
          accessed, it is passed to `create_attribute` with the appropriate
          class type.

          Initially, objects are stored in self._items. When accessed, the
          objects are reified and stored in self.items. This is transparently
          handled by self.__getitem__(self, key).

          """
          print(f"NewArray<self._contains> with len len(items)")
          self._items = items
          self.items =

          def __init_subclass__(cls, **kwargs):
          """Record the type *contained in* the subclass-array.

          A subclass like:

          class array_holding_foo(Array, contains=Foo):
          pass

          will have all its inner objects instantiated using class Foo.

          """
          cls._contains = kwargs.pop('contains')

          def __len__(self):
          return len(self._items)

          def __iter__(self):
          """Iterate over items in array. Use integer indexing so that
          __getitem__ can handle reifying all the objects.

          """
          for index in range(len(self)):
          yield self[index]

          def __getitem__(self, key):
          print(f"getitem[key] called on NewArray<self._contains>")
          if isinstance(key, slice):
          length = len(self.items)
          start = (0 if key.start is None
          else key.start if key.start >= 0
          else key.start + length)
          stop = (length if key.stop is None
          else key.stop if key.stop >= 0
          else key.stop + length)
          step = (1 if key.step is None
          else key.step)

          # Note: this reifies the items before putting them in the
          # new object.
          return self.__class__(*[self[index]
          for index in range(start, stop, step)])

          length = len(self._items)
          if key < 0:
          key += length

          if not (0 <= key < length):
          raise IndexError('Array index out of range')

          if key >= len(self.items):
          self.items += [None] * (key - len(self.items) + 1)

          if self.items[key] is None:
          json = self._items[key]
          self.items[key] = create_attribute(self._contains, json)

          return self.items[key]


          I know that this code does less work when an array is created: it creates and sets two attributes in __init__ but doesn't loop at all. I had an older version that set the .items list to [None] * len(items), which made for less work in the __getitem__ method, but it still "looped" N times, so I tried squeezing that out!



          But your benchmark using this code averaged a few hundredths of a second slower than the benchmark using your old Array implementation.



          I think that means that the limit of performance has been reached. I ran the client_benchmark script 10 times, sorted the times reported, dropped the max and min, and averaged the 8 remaining (for both OldArray and NewArray versions).




          Old array: avg time: 6.518096446990967



          New array: avg time: 6.587247729301453




          My take-away is that your code- as written when you posted this- is close enough to "doing nothing" in performance that tweaking it just produces noise on this benchmark.



          Sooooo.... you need another benchmark! Possibly several benchmarks.



          You should save this as the "creating arrays" benchmark, and add it to your perftest directory (which you don't have... yet).



          Then maybe create some other benchmarks, reflective of actual use cases, which we can use to hammer out the performance of objects when the arrays are actually accessed, instead of just creating them.



          Edit:



          Also, if slicing is actually used it probably deserves better treatment. There should be a way of copying the json and actual versions in the initializer.







          share|improve this answer













          share|improve this answer



          share|improve this answer











          answered Jan 8 at 23:27









          Austin Hastings

          6,1591130




          6,1591130











          • That is just brilliant! You __getitem__ method is something. Thanks for taking the time to run the benchmark! Also I realize how lazy I have been with my doc strings compared to your code. This is going straight into master! I will create a perftest directory and more relevant benchmarks
            – James Schinner
            Jan 9 at 2:04










          • Great, I merged your code. Added a __repr__ and re-implemented the get_id, get_instrument methods to suite. I also implemented the slicing as return self.__class__(*[self[index] for index in range(len(self._items))[slice]]) I found here: stackoverflow.com/questions/13855288/… As a side note, I hadn't though of equality testing for instances until you added the __contains__ method. I wonder if I can check equality with out instantiating all attributes. Hmm...
            – James Schinner
            Jan 9 at 6:47











          • I was wondering if you could do get_id without having to reify the objects. Just check the json dicts.
            – Austin Hastings
            Jan 9 at 7:29










          • I implemented that great idea and updated my tests. Back to %100 coverage. Regarding the equality testing, I ended up add storing the string f"self.__class__.__name__(**(kwargs))" in Model's init to be used by __eq__ which meant I didn't have to completely reify the instance. However, that then got me thinking about the __hash__ method. async_v20's features immutable objects, so as a user I would expect hash(Model(1)) == hash(Model(1)) == True. I used the same string for the equality test for the hash. I don't know if this is a good idea? Thanks so much for your work !
            – James Schinner
            Jan 10 at 13:05

















          • That is just brilliant! You __getitem__ method is something. Thanks for taking the time to run the benchmark! Also I realize how lazy I have been with my doc strings compared to your code. This is going straight into master! I will create a perftest directory and more relevant benchmarks
            – James Schinner
            Jan 9 at 2:04










          • Great, I merged your code. Added a __repr__ and re-implemented the get_id, get_instrument methods to suite. I also implemented the slicing as return self.__class__(*[self[index] for index in range(len(self._items))[slice]]) I found here: stackoverflow.com/questions/13855288/… As a side note, I hadn't though of equality testing for instances until you added the __contains__ method. I wonder if I can check equality with out instantiating all attributes. Hmm...
            – James Schinner
            Jan 9 at 6:47











          • I was wondering if you could do get_id without having to reify the objects. Just check the json dicts.
            – Austin Hastings
            Jan 9 at 7:29










          • I implemented that great idea and updated my tests. Back to %100 coverage. Regarding the equality testing, I ended up add storing the string f"self.__class__.__name__(**(kwargs))" in Model's init to be used by __eq__ which meant I didn't have to completely reify the instance. However, that then got me thinking about the __hash__ method. async_v20's features immutable objects, so as a user I would expect hash(Model(1)) == hash(Model(1)) == True. I used the same string for the equality test for the hash. I don't know if this is a good idea? Thanks so much for your work !
            – James Schinner
            Jan 10 at 13:05
















          That is just brilliant! You __getitem__ method is something. Thanks for taking the time to run the benchmark! Also I realize how lazy I have been with my doc strings compared to your code. This is going straight into master! I will create a perftest directory and more relevant benchmarks
          – James Schinner
          Jan 9 at 2:04




          That is just brilliant! You __getitem__ method is something. Thanks for taking the time to run the benchmark! Also I realize how lazy I have been with my doc strings compared to your code. This is going straight into master! I will create a perftest directory and more relevant benchmarks
          – James Schinner
          Jan 9 at 2:04












          Great, I merged your code. Added a __repr__ and re-implemented the get_id, get_instrument methods to suite. I also implemented the slicing as return self.__class__(*[self[index] for index in range(len(self._items))[slice]]) I found here: stackoverflow.com/questions/13855288/… As a side note, I hadn't though of equality testing for instances until you added the __contains__ method. I wonder if I can check equality with out instantiating all attributes. Hmm...
          – James Schinner
          Jan 9 at 6:47





          Great, I merged your code. Added a __repr__ and re-implemented the get_id, get_instrument methods to suite. I also implemented the slicing as return self.__class__(*[self[index] for index in range(len(self._items))[slice]]) I found here: stackoverflow.com/questions/13855288/… As a side note, I hadn't though of equality testing for instances until you added the __contains__ method. I wonder if I can check equality with out instantiating all attributes. Hmm...
          – James Schinner
          Jan 9 at 6:47













          I was wondering if you could do get_id without having to reify the objects. Just check the json dicts.
          – Austin Hastings
          Jan 9 at 7:29




          I was wondering if you could do get_id without having to reify the objects. Just check the json dicts.
          – Austin Hastings
          Jan 9 at 7:29












          I implemented that great idea and updated my tests. Back to %100 coverage. Regarding the equality testing, I ended up add storing the string f"self.__class__.__name__(**(kwargs))" in Model's init to be used by __eq__ which meant I didn't have to completely reify the instance. However, that then got me thinking about the __hash__ method. async_v20's features immutable objects, so as a user I would expect hash(Model(1)) == hash(Model(1)) == True. I used the same string for the equality test for the hash. I don't know if this is a good idea? Thanks so much for your work !
          – James Schinner
          Jan 10 at 13:05





          I implemented that great idea and updated my tests. Back to %100 coverage. Regarding the equality testing, I ended up add storing the string f"self.__class__.__name__(**(kwargs))" in Model's init to be used by __eq__ which meant I didn't have to completely reify the instance. However, that then got me thinking about the __hash__ method. async_v20's features immutable objects, so as a user I would expect hash(Model(1)) == hash(Model(1)) == True. I used the same string for the equality test for the hash. I don't know if this is a good idea? Thanks so much for your work !
          – James Schinner
          Jan 10 at 13:05













           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f184573%2fpython-jit-container-type%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Python Lists

          Aion

          JavaScript Array Iteration Methods