Python - basis of a parser library, followup

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
6
down vote

favorite












This is a followup to this question: link. As advised there, I've rewritten the State class, which also simplified some of the other code.



I'm writing a parser library. I would like to know if the basis of it is sound and whether it can be improved in any way. The entire code can be seen in this repo: link, in the frozen branch review-11-02-2018 (file core.py). I'll post relevant portions below.



The library is written around two things: a State class, and a notion of an effect. State objects represent current state of a parser chain, and keep track of the input that's left to parse and a portion of input parsed by the last parser. A parser is just a callable that takes a State object and returns a new one (State objects are immutable). Output of one parser is passed to the next parser in a chain. A parser may also fail by throwing a ParsingFailure exception. Any parser in the chain may register an effect - a callable that takes an arbitrary value as the first argument and a State object as the second. If the chain succeeds, all effects registered during the parsing run are applied sequentially to a seed (with the seed or return value of the previous effect being the first argument, and the state of the chain at the moment of effect registration being the second), and the return value of the last effect becomes the output of entire chain, along with the final state. Is the concept sane? It works, but is it a reasonable way to do this?



State class is just a named tuple with some extra methods and is defined as follows:



class State(namedtuple("State", "string effect left_start left_end parsed_start parsed_end")):
"""
State objects represent current state of a parser chain (or an individual
parser).

State objects provide two views over the input string: 'left', which spans
a substring between 'left_start' and 'left_end' and represents unparsed
input left from the previous parser, and 'parsed', which spans a substring
between 'parsed_start' and 'parsed_end' and represents a portion of input
the previous parser has parsed. Windows may overlap, and usually
'parsed_end' and 'left_start' are the same, but not always.

A State object is immutable and has following fields:
* string (str): the input the parser chain is supposed to parse.
* effect ((value, state) -> value): if the chain is successful, this will
be called in sequence with other effects from the chain to form the
chain's output value.
* left_start, left_end (int): see above about the 'left' window.
* parsed_start, parser_end (int): see above about the 'parsed' window.

State objects are just named tuples, so they support a very convenient
'_replace' method. !Note!: to avoid duplicating effects accidentally,
'_replace' treats lack of 'effect' in its arguments as 'effect=None'. So if
you want to copy an effect from another parser, you have to do it
explicitly.

State objects' constructor takes the following arguments:
1. string - the input.
2. effect=None - the effect, transformation to be performed on success of
the last parser.
3. start=0 - will be translated into 'left_start'
4. end=None - will be translated into 'left_end'. If set to None,
'left_end' will be set to the length of the input.
State objects created via this constructor have both 'parsed_start' and
'parsed_end' set to 'left_start'.

State objects have several properties:
* left - returns a slice of input that's left to parse.
* left_len - returns the length of the above slice without computing the
slice itself.
* parsed - returns a slice of input that's been parsed.
* parsed_len - returns the length of the above slice, again without
computing the slice.

Finally, State objects have following public methods:
* consume(how_many) - move 'how_many' characters from the left window into
the parsed window. Raise ValueError if more input was consumed than left.
* split(at) - split the State in two (and return them). The first keeps
the input up to, but not including, 'at' as its 'left' window, the second
gets the rest. Both have their 'parsed' windows reset to an empty string.
The first gets 'effect' of the original, the second gets None.
"""

__slots__ =

def __new__(cls, string, effect=None, start=0, end=None):
if end is None:
end = len(string)
assert 0 <= start <= end <= len(string)
return super().__new__(cls, string, effect, start, end, start, start)

def _replace(self, **kwargs):
if "effect" not in kwargs:
return super()._replace(effect=None, **kwargs)
return super()._replace(**kwargs)

def consume(self, how_many):
"""
Return a new State object with 'how_many' characters consumed and moved
to the 'parsed' window.

Raise ValueError if 'how_many' is negative or if consuming more
characters than left in the 'left' window.
"""
if how_many < 0:
raise ValueError("Negative number of consumed characters")
left_start = self.left_start + how_many
parsed_start = self.left_start
parsed_end = parsed_start + how_many
if left_start > self.left_end:
raise ValueError("Consumed more characters than fits in the 'left' window")
return self._replace(left_start=left_start, parsed_start=parsed_start,
parsed_end=parsed_end)

def split(self, at):
"""
Split the State in two. The first one keeps a portion of input up to
'at'th character (exclusive), the second one gets the rest. Both have
'parsed' window reset to an empty string. First one gets the effect of
the original, the second one gets None.
"""
split_point = self.left_start + at
first = self._replace(effect=self.effect,
left_end=split_point,
parsed_start=self.left_start,
parsed_end=self.left_start)
second = self._replace(effect=None,
left_start=split_point,
parsed_start=split_point,
parsed_end=split_point)
return first, second

@property
def left(self):
"""
Return the portion of input the last parser hasn't consumed.
"""
return self.string[self.left_start:self.left_end]

@property
def left_len(self):
"""
Return the length of the portion of input the last parser hasn't
consumed.
"""
return self.left_end - self.left_start

@property
def parsed(self):
"""
Return the string parsed by the last parser.
"""
return self.string[self.parsed_start:self.parsed_end]

@property
def parsed_len(self):
"""
Return the length of the string parsed by the last parser.
"""
return self.parsed_end - self.parsed_start


Hopefully docstrings are descriptive enough. If any further clarification is needed, please tell me, I'll edit it in.



Another important thing is parse function, which the user is supposed to call with a parser instead of calling parsers directly. Here it is:



def parse(seed, state_or_string, parser, verbose=False):
"""
Run a given parser on a given state object or a string, then apply combined
chain or parser's effects to 'seed' and return a tuple
(seed after effects, final state).

On failure, return None unless 'verbose' is truthy, in which case return
the ParsingFailure exception that has terminated the parsing process.
"""
if isinstance(state_or_string, str):
state = State(state_or_string)
else:
state = state_or_string
try:
after = parser(state)
if after.effect is not None:
return after.effect(seed, after), after
return seed, after
except ParsingFailure as failure:
if verbose:
return failure
return None
except ParsingEnd as end:
if end.state.effect is not None:
return end.state.effect(seed, end.state), end.state
return seed, end.state


Another important thing is chain parser generator, which performs the chaining logic described above, but I don't want to post it here, because a) the question is bloated already, b) it also deals with lookahead, which seems to me a matter worth asking a separate question.



If you've read this far, thank you! Any suggestions on improving the library?







share|improve this question

























    up vote
    6
    down vote

    favorite












    This is a followup to this question: link. As advised there, I've rewritten the State class, which also simplified some of the other code.



    I'm writing a parser library. I would like to know if the basis of it is sound and whether it can be improved in any way. The entire code can be seen in this repo: link, in the frozen branch review-11-02-2018 (file core.py). I'll post relevant portions below.



    The library is written around two things: a State class, and a notion of an effect. State objects represent current state of a parser chain, and keep track of the input that's left to parse and a portion of input parsed by the last parser. A parser is just a callable that takes a State object and returns a new one (State objects are immutable). Output of one parser is passed to the next parser in a chain. A parser may also fail by throwing a ParsingFailure exception. Any parser in the chain may register an effect - a callable that takes an arbitrary value as the first argument and a State object as the second. If the chain succeeds, all effects registered during the parsing run are applied sequentially to a seed (with the seed or return value of the previous effect being the first argument, and the state of the chain at the moment of effect registration being the second), and the return value of the last effect becomes the output of entire chain, along with the final state. Is the concept sane? It works, but is it a reasonable way to do this?



    State class is just a named tuple with some extra methods and is defined as follows:



    class State(namedtuple("State", "string effect left_start left_end parsed_start parsed_end")):
    """
    State objects represent current state of a parser chain (or an individual
    parser).

    State objects provide two views over the input string: 'left', which spans
    a substring between 'left_start' and 'left_end' and represents unparsed
    input left from the previous parser, and 'parsed', which spans a substring
    between 'parsed_start' and 'parsed_end' and represents a portion of input
    the previous parser has parsed. Windows may overlap, and usually
    'parsed_end' and 'left_start' are the same, but not always.

    A State object is immutable and has following fields:
    * string (str): the input the parser chain is supposed to parse.
    * effect ((value, state) -> value): if the chain is successful, this will
    be called in sequence with other effects from the chain to form the
    chain's output value.
    * left_start, left_end (int): see above about the 'left' window.
    * parsed_start, parser_end (int): see above about the 'parsed' window.

    State objects are just named tuples, so they support a very convenient
    '_replace' method. !Note!: to avoid duplicating effects accidentally,
    '_replace' treats lack of 'effect' in its arguments as 'effect=None'. So if
    you want to copy an effect from another parser, you have to do it
    explicitly.

    State objects' constructor takes the following arguments:
    1. string - the input.
    2. effect=None - the effect, transformation to be performed on success of
    the last parser.
    3. start=0 - will be translated into 'left_start'
    4. end=None - will be translated into 'left_end'. If set to None,
    'left_end' will be set to the length of the input.
    State objects created via this constructor have both 'parsed_start' and
    'parsed_end' set to 'left_start'.

    State objects have several properties:
    * left - returns a slice of input that's left to parse.
    * left_len - returns the length of the above slice without computing the
    slice itself.
    * parsed - returns a slice of input that's been parsed.
    * parsed_len - returns the length of the above slice, again without
    computing the slice.

    Finally, State objects have following public methods:
    * consume(how_many) - move 'how_many' characters from the left window into
    the parsed window. Raise ValueError if more input was consumed than left.
    * split(at) - split the State in two (and return them). The first keeps
    the input up to, but not including, 'at' as its 'left' window, the second
    gets the rest. Both have their 'parsed' windows reset to an empty string.
    The first gets 'effect' of the original, the second gets None.
    """

    __slots__ =

    def __new__(cls, string, effect=None, start=0, end=None):
    if end is None:
    end = len(string)
    assert 0 <= start <= end <= len(string)
    return super().__new__(cls, string, effect, start, end, start, start)

    def _replace(self, **kwargs):
    if "effect" not in kwargs:
    return super()._replace(effect=None, **kwargs)
    return super()._replace(**kwargs)

    def consume(self, how_many):
    """
    Return a new State object with 'how_many' characters consumed and moved
    to the 'parsed' window.

    Raise ValueError if 'how_many' is negative or if consuming more
    characters than left in the 'left' window.
    """
    if how_many < 0:
    raise ValueError("Negative number of consumed characters")
    left_start = self.left_start + how_many
    parsed_start = self.left_start
    parsed_end = parsed_start + how_many
    if left_start > self.left_end:
    raise ValueError("Consumed more characters than fits in the 'left' window")
    return self._replace(left_start=left_start, parsed_start=parsed_start,
    parsed_end=parsed_end)

    def split(self, at):
    """
    Split the State in two. The first one keeps a portion of input up to
    'at'th character (exclusive), the second one gets the rest. Both have
    'parsed' window reset to an empty string. First one gets the effect of
    the original, the second one gets None.
    """
    split_point = self.left_start + at
    first = self._replace(effect=self.effect,
    left_end=split_point,
    parsed_start=self.left_start,
    parsed_end=self.left_start)
    second = self._replace(effect=None,
    left_start=split_point,
    parsed_start=split_point,
    parsed_end=split_point)
    return first, second

    @property
    def left(self):
    """
    Return the portion of input the last parser hasn't consumed.
    """
    return self.string[self.left_start:self.left_end]

    @property
    def left_len(self):
    """
    Return the length of the portion of input the last parser hasn't
    consumed.
    """
    return self.left_end - self.left_start

    @property
    def parsed(self):
    """
    Return the string parsed by the last parser.
    """
    return self.string[self.parsed_start:self.parsed_end]

    @property
    def parsed_len(self):
    """
    Return the length of the string parsed by the last parser.
    """
    return self.parsed_end - self.parsed_start


    Hopefully docstrings are descriptive enough. If any further clarification is needed, please tell me, I'll edit it in.



    Another important thing is parse function, which the user is supposed to call with a parser instead of calling parsers directly. Here it is:



    def parse(seed, state_or_string, parser, verbose=False):
    """
    Run a given parser on a given state object or a string, then apply combined
    chain or parser's effects to 'seed' and return a tuple
    (seed after effects, final state).

    On failure, return None unless 'verbose' is truthy, in which case return
    the ParsingFailure exception that has terminated the parsing process.
    """
    if isinstance(state_or_string, str):
    state = State(state_or_string)
    else:
    state = state_or_string
    try:
    after = parser(state)
    if after.effect is not None:
    return after.effect(seed, after), after
    return seed, after
    except ParsingFailure as failure:
    if verbose:
    return failure
    return None
    except ParsingEnd as end:
    if end.state.effect is not None:
    return end.state.effect(seed, end.state), end.state
    return seed, end.state


    Another important thing is chain parser generator, which performs the chaining logic described above, but I don't want to post it here, because a) the question is bloated already, b) it also deals with lookahead, which seems to me a matter worth asking a separate question.



    If you've read this far, thank you! Any suggestions on improving the library?







    share|improve this question





















      up vote
      6
      down vote

      favorite









      up vote
      6
      down vote

      favorite











      This is a followup to this question: link. As advised there, I've rewritten the State class, which also simplified some of the other code.



      I'm writing a parser library. I would like to know if the basis of it is sound and whether it can be improved in any way. The entire code can be seen in this repo: link, in the frozen branch review-11-02-2018 (file core.py). I'll post relevant portions below.



      The library is written around two things: a State class, and a notion of an effect. State objects represent current state of a parser chain, and keep track of the input that's left to parse and a portion of input parsed by the last parser. A parser is just a callable that takes a State object and returns a new one (State objects are immutable). Output of one parser is passed to the next parser in a chain. A parser may also fail by throwing a ParsingFailure exception. Any parser in the chain may register an effect - a callable that takes an arbitrary value as the first argument and a State object as the second. If the chain succeeds, all effects registered during the parsing run are applied sequentially to a seed (with the seed or return value of the previous effect being the first argument, and the state of the chain at the moment of effect registration being the second), and the return value of the last effect becomes the output of entire chain, along with the final state. Is the concept sane? It works, but is it a reasonable way to do this?



      State class is just a named tuple with some extra methods and is defined as follows:



      class State(namedtuple("State", "string effect left_start left_end parsed_start parsed_end")):
      """
      State objects represent current state of a parser chain (or an individual
      parser).

      State objects provide two views over the input string: 'left', which spans
      a substring between 'left_start' and 'left_end' and represents unparsed
      input left from the previous parser, and 'parsed', which spans a substring
      between 'parsed_start' and 'parsed_end' and represents a portion of input
      the previous parser has parsed. Windows may overlap, and usually
      'parsed_end' and 'left_start' are the same, but not always.

      A State object is immutable and has following fields:
      * string (str): the input the parser chain is supposed to parse.
      * effect ((value, state) -> value): if the chain is successful, this will
      be called in sequence with other effects from the chain to form the
      chain's output value.
      * left_start, left_end (int): see above about the 'left' window.
      * parsed_start, parser_end (int): see above about the 'parsed' window.

      State objects are just named tuples, so they support a very convenient
      '_replace' method. !Note!: to avoid duplicating effects accidentally,
      '_replace' treats lack of 'effect' in its arguments as 'effect=None'. So if
      you want to copy an effect from another parser, you have to do it
      explicitly.

      State objects' constructor takes the following arguments:
      1. string - the input.
      2. effect=None - the effect, transformation to be performed on success of
      the last parser.
      3. start=0 - will be translated into 'left_start'
      4. end=None - will be translated into 'left_end'. If set to None,
      'left_end' will be set to the length of the input.
      State objects created via this constructor have both 'parsed_start' and
      'parsed_end' set to 'left_start'.

      State objects have several properties:
      * left - returns a slice of input that's left to parse.
      * left_len - returns the length of the above slice without computing the
      slice itself.
      * parsed - returns a slice of input that's been parsed.
      * parsed_len - returns the length of the above slice, again without
      computing the slice.

      Finally, State objects have following public methods:
      * consume(how_many) - move 'how_many' characters from the left window into
      the parsed window. Raise ValueError if more input was consumed than left.
      * split(at) - split the State in two (and return them). The first keeps
      the input up to, but not including, 'at' as its 'left' window, the second
      gets the rest. Both have their 'parsed' windows reset to an empty string.
      The first gets 'effect' of the original, the second gets None.
      """

      __slots__ =

      def __new__(cls, string, effect=None, start=0, end=None):
      if end is None:
      end = len(string)
      assert 0 <= start <= end <= len(string)
      return super().__new__(cls, string, effect, start, end, start, start)

      def _replace(self, **kwargs):
      if "effect" not in kwargs:
      return super()._replace(effect=None, **kwargs)
      return super()._replace(**kwargs)

      def consume(self, how_many):
      """
      Return a new State object with 'how_many' characters consumed and moved
      to the 'parsed' window.

      Raise ValueError if 'how_many' is negative or if consuming more
      characters than left in the 'left' window.
      """
      if how_many < 0:
      raise ValueError("Negative number of consumed characters")
      left_start = self.left_start + how_many
      parsed_start = self.left_start
      parsed_end = parsed_start + how_many
      if left_start > self.left_end:
      raise ValueError("Consumed more characters than fits in the 'left' window")
      return self._replace(left_start=left_start, parsed_start=parsed_start,
      parsed_end=parsed_end)

      def split(self, at):
      """
      Split the State in two. The first one keeps a portion of input up to
      'at'th character (exclusive), the second one gets the rest. Both have
      'parsed' window reset to an empty string. First one gets the effect of
      the original, the second one gets None.
      """
      split_point = self.left_start + at
      first = self._replace(effect=self.effect,
      left_end=split_point,
      parsed_start=self.left_start,
      parsed_end=self.left_start)
      second = self._replace(effect=None,
      left_start=split_point,
      parsed_start=split_point,
      parsed_end=split_point)
      return first, second

      @property
      def left(self):
      """
      Return the portion of input the last parser hasn't consumed.
      """
      return self.string[self.left_start:self.left_end]

      @property
      def left_len(self):
      """
      Return the length of the portion of input the last parser hasn't
      consumed.
      """
      return self.left_end - self.left_start

      @property
      def parsed(self):
      """
      Return the string parsed by the last parser.
      """
      return self.string[self.parsed_start:self.parsed_end]

      @property
      def parsed_len(self):
      """
      Return the length of the string parsed by the last parser.
      """
      return self.parsed_end - self.parsed_start


      Hopefully docstrings are descriptive enough. If any further clarification is needed, please tell me, I'll edit it in.



      Another important thing is parse function, which the user is supposed to call with a parser instead of calling parsers directly. Here it is:



      def parse(seed, state_or_string, parser, verbose=False):
      """
      Run a given parser on a given state object or a string, then apply combined
      chain or parser's effects to 'seed' and return a tuple
      (seed after effects, final state).

      On failure, return None unless 'verbose' is truthy, in which case return
      the ParsingFailure exception that has terminated the parsing process.
      """
      if isinstance(state_or_string, str):
      state = State(state_or_string)
      else:
      state = state_or_string
      try:
      after = parser(state)
      if after.effect is not None:
      return after.effect(seed, after), after
      return seed, after
      except ParsingFailure as failure:
      if verbose:
      return failure
      return None
      except ParsingEnd as end:
      if end.state.effect is not None:
      return end.state.effect(seed, end.state), end.state
      return seed, end.state


      Another important thing is chain parser generator, which performs the chaining logic described above, but I don't want to post it here, because a) the question is bloated already, b) it also deals with lookahead, which seems to me a matter worth asking a separate question.



      If you've read this far, thank you! Any suggestions on improving the library?







      share|improve this question











      This is a followup to this question: link. As advised there, I've rewritten the State class, which also simplified some of the other code.



      I'm writing a parser library. I would like to know if the basis of it is sound and whether it can be improved in any way. The entire code can be seen in this repo: link, in the frozen branch review-11-02-2018 (file core.py). I'll post relevant portions below.



      The library is written around two things: a State class, and a notion of an effect. State objects represent current state of a parser chain, and keep track of the input that's left to parse and a portion of input parsed by the last parser. A parser is just a callable that takes a State object and returns a new one (State objects are immutable). Output of one parser is passed to the next parser in a chain. A parser may also fail by throwing a ParsingFailure exception. Any parser in the chain may register an effect - a callable that takes an arbitrary value as the first argument and a State object as the second. If the chain succeeds, all effects registered during the parsing run are applied sequentially to a seed (with the seed or return value of the previous effect being the first argument, and the state of the chain at the moment of effect registration being the second), and the return value of the last effect becomes the output of entire chain, along with the final state. Is the concept sane? It works, but is it a reasonable way to do this?



      State class is just a named tuple with some extra methods and is defined as follows:



      class State(namedtuple("State", "string effect left_start left_end parsed_start parsed_end")):
      """
      State objects represent current state of a parser chain (or an individual
      parser).

      State objects provide two views over the input string: 'left', which spans
      a substring between 'left_start' and 'left_end' and represents unparsed
      input left from the previous parser, and 'parsed', which spans a substring
      between 'parsed_start' and 'parsed_end' and represents a portion of input
      the previous parser has parsed. Windows may overlap, and usually
      'parsed_end' and 'left_start' are the same, but not always.

      A State object is immutable and has following fields:
      * string (str): the input the parser chain is supposed to parse.
      * effect ((value, state) -> value): if the chain is successful, this will
      be called in sequence with other effects from the chain to form the
      chain's output value.
      * left_start, left_end (int): see above about the 'left' window.
      * parsed_start, parser_end (int): see above about the 'parsed' window.

      State objects are just named tuples, so they support a very convenient
      '_replace' method. !Note!: to avoid duplicating effects accidentally,
      '_replace' treats lack of 'effect' in its arguments as 'effect=None'. So if
      you want to copy an effect from another parser, you have to do it
      explicitly.

      State objects' constructor takes the following arguments:
      1. string - the input.
      2. effect=None - the effect, transformation to be performed on success of
      the last parser.
      3. start=0 - will be translated into 'left_start'
      4. end=None - will be translated into 'left_end'. If set to None,
      'left_end' will be set to the length of the input.
      State objects created via this constructor have both 'parsed_start' and
      'parsed_end' set to 'left_start'.

      State objects have several properties:
      * left - returns a slice of input that's left to parse.
      * left_len - returns the length of the above slice without computing the
      slice itself.
      * parsed - returns a slice of input that's been parsed.
      * parsed_len - returns the length of the above slice, again without
      computing the slice.

      Finally, State objects have following public methods:
      * consume(how_many) - move 'how_many' characters from the left window into
      the parsed window. Raise ValueError if more input was consumed than left.
      * split(at) - split the State in two (and return them). The first keeps
      the input up to, but not including, 'at' as its 'left' window, the second
      gets the rest. Both have their 'parsed' windows reset to an empty string.
      The first gets 'effect' of the original, the second gets None.
      """

      __slots__ =

      def __new__(cls, string, effect=None, start=0, end=None):
      if end is None:
      end = len(string)
      assert 0 <= start <= end <= len(string)
      return super().__new__(cls, string, effect, start, end, start, start)

      def _replace(self, **kwargs):
      if "effect" not in kwargs:
      return super()._replace(effect=None, **kwargs)
      return super()._replace(**kwargs)

      def consume(self, how_many):
      """
      Return a new State object with 'how_many' characters consumed and moved
      to the 'parsed' window.

      Raise ValueError if 'how_many' is negative or if consuming more
      characters than left in the 'left' window.
      """
      if how_many < 0:
      raise ValueError("Negative number of consumed characters")
      left_start = self.left_start + how_many
      parsed_start = self.left_start
      parsed_end = parsed_start + how_many
      if left_start > self.left_end:
      raise ValueError("Consumed more characters than fits in the 'left' window")
      return self._replace(left_start=left_start, parsed_start=parsed_start,
      parsed_end=parsed_end)

      def split(self, at):
      """
      Split the State in two. The first one keeps a portion of input up to
      'at'th character (exclusive), the second one gets the rest. Both have
      'parsed' window reset to an empty string. First one gets the effect of
      the original, the second one gets None.
      """
      split_point = self.left_start + at
      first = self._replace(effect=self.effect,
      left_end=split_point,
      parsed_start=self.left_start,
      parsed_end=self.left_start)
      second = self._replace(effect=None,
      left_start=split_point,
      parsed_start=split_point,
      parsed_end=split_point)
      return first, second

      @property
      def left(self):
      """
      Return the portion of input the last parser hasn't consumed.
      """
      return self.string[self.left_start:self.left_end]

      @property
      def left_len(self):
      """
      Return the length of the portion of input the last parser hasn't
      consumed.
      """
      return self.left_end - self.left_start

      @property
      def parsed(self):
      """
      Return the string parsed by the last parser.
      """
      return self.string[self.parsed_start:self.parsed_end]

      @property
      def parsed_len(self):
      """
      Return the length of the string parsed by the last parser.
      """
      return self.parsed_end - self.parsed_start


      Hopefully docstrings are descriptive enough. If any further clarification is needed, please tell me, I'll edit it in.



      Another important thing is parse function, which the user is supposed to call with a parser instead of calling parsers directly. Here it is:



      def parse(seed, state_or_string, parser, verbose=False):
      """
      Run a given parser on a given state object or a string, then apply combined
      chain or parser's effects to 'seed' and return a tuple
      (seed after effects, final state).

      On failure, return None unless 'verbose' is truthy, in which case return
      the ParsingFailure exception that has terminated the parsing process.
      """
      if isinstance(state_or_string, str):
      state = State(state_or_string)
      else:
      state = state_or_string
      try:
      after = parser(state)
      if after.effect is not None:
      return after.effect(seed, after), after
      return seed, after
      except ParsingFailure as failure:
      if verbose:
      return failure
      return None
      except ParsingEnd as end:
      if end.state.effect is not None:
      return end.state.effect(seed, end.state), end.state
      return seed, end.state


      Another important thing is chain parser generator, which performs the chaining logic described above, but I don't want to post it here, because a) the question is bloated already, b) it also deals with lookahead, which seems to me a matter worth asking a separate question.



      If you've read this far, thank you! Any suggestions on improving the library?









      share|improve this question










      share|improve this question




      share|improve this question









      asked Feb 11 at 15:14









      Michail

      2648




      2648




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          Following up on the previous comments they made about the parser, I will focus on code legibility



          You may consider moving some paragraphs of the class docstring to the relevant part of the code, like the paragraph about A State object is immutable and has following fields may suit better on the __new__ method, documenting as well the parameter



          Also, don't forget to document the parameters of methods/functions, will help a lot knowing what your parser is doing when invoking them



          Parameters like at might be better named as index which is universally used for it.



          Parameters like how_many may be better named as characters_count or consumed_characters_count or similar, simply by reading it gives better insight on what the parameter refers to



          if verbose:
          return failure
          return None


          This can be translated to ternary operator



          return failure if verbose else None


          For the following small methods, I would consider better naming as well, like this one



          def left(self):
          """
          Return the portion of input the last parser hasn't consumed.
          """


          Why not calling it portion_not_consumed or something on that way?



          Just imagine you don't know anything about a library, and you find a method called left_len. Unless you read the docstring, is hard to tell what is doing



          You can as well consider putting some spacing between the lines of a same function, to improve a bit readability between the sections of it



          Keep the good work!






          share|improve this answer





















            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "196"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187315%2fpython-basis-of-a-parser-library-followup%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            Following up on the previous comments they made about the parser, I will focus on code legibility



            You may consider moving some paragraphs of the class docstring to the relevant part of the code, like the paragraph about A State object is immutable and has following fields may suit better on the __new__ method, documenting as well the parameter



            Also, don't forget to document the parameters of methods/functions, will help a lot knowing what your parser is doing when invoking them



            Parameters like at might be better named as index which is universally used for it.



            Parameters like how_many may be better named as characters_count or consumed_characters_count or similar, simply by reading it gives better insight on what the parameter refers to



            if verbose:
            return failure
            return None


            This can be translated to ternary operator



            return failure if verbose else None


            For the following small methods, I would consider better naming as well, like this one



            def left(self):
            """
            Return the portion of input the last parser hasn't consumed.
            """


            Why not calling it portion_not_consumed or something on that way?



            Just imagine you don't know anything about a library, and you find a method called left_len. Unless you read the docstring, is hard to tell what is doing



            You can as well consider putting some spacing between the lines of a same function, to improve a bit readability between the sections of it



            Keep the good work!






            share|improve this answer

























              up vote
              1
              down vote



              accepted










              Following up on the previous comments they made about the parser, I will focus on code legibility



              You may consider moving some paragraphs of the class docstring to the relevant part of the code, like the paragraph about A State object is immutable and has following fields may suit better on the __new__ method, documenting as well the parameter



              Also, don't forget to document the parameters of methods/functions, will help a lot knowing what your parser is doing when invoking them



              Parameters like at might be better named as index which is universally used for it.



              Parameters like how_many may be better named as characters_count or consumed_characters_count or similar, simply by reading it gives better insight on what the parameter refers to



              if verbose:
              return failure
              return None


              This can be translated to ternary operator



              return failure if verbose else None


              For the following small methods, I would consider better naming as well, like this one



              def left(self):
              """
              Return the portion of input the last parser hasn't consumed.
              """


              Why not calling it portion_not_consumed or something on that way?



              Just imagine you don't know anything about a library, and you find a method called left_len. Unless you read the docstring, is hard to tell what is doing



              You can as well consider putting some spacing between the lines of a same function, to improve a bit readability between the sections of it



              Keep the good work!






              share|improve this answer























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                Following up on the previous comments they made about the parser, I will focus on code legibility



                You may consider moving some paragraphs of the class docstring to the relevant part of the code, like the paragraph about A State object is immutable and has following fields may suit better on the __new__ method, documenting as well the parameter



                Also, don't forget to document the parameters of methods/functions, will help a lot knowing what your parser is doing when invoking them



                Parameters like at might be better named as index which is universally used for it.



                Parameters like how_many may be better named as characters_count or consumed_characters_count or similar, simply by reading it gives better insight on what the parameter refers to



                if verbose:
                return failure
                return None


                This can be translated to ternary operator



                return failure if verbose else None


                For the following small methods, I would consider better naming as well, like this one



                def left(self):
                """
                Return the portion of input the last parser hasn't consumed.
                """


                Why not calling it portion_not_consumed or something on that way?



                Just imagine you don't know anything about a library, and you find a method called left_len. Unless you read the docstring, is hard to tell what is doing



                You can as well consider putting some spacing between the lines of a same function, to improve a bit readability between the sections of it



                Keep the good work!






                share|improve this answer













                Following up on the previous comments they made about the parser, I will focus on code legibility



                You may consider moving some paragraphs of the class docstring to the relevant part of the code, like the paragraph about A State object is immutable and has following fields may suit better on the __new__ method, documenting as well the parameter



                Also, don't forget to document the parameters of methods/functions, will help a lot knowing what your parser is doing when invoking them



                Parameters like at might be better named as index which is universally used for it.



                Parameters like how_many may be better named as characters_count or consumed_characters_count or similar, simply by reading it gives better insight on what the parameter refers to



                if verbose:
                return failure
                return None


                This can be translated to ternary operator



                return failure if verbose else None


                For the following small methods, I would consider better naming as well, like this one



                def left(self):
                """
                Return the portion of input the last parser hasn't consumed.
                """


                Why not calling it portion_not_consumed or something on that way?



                Just imagine you don't know anything about a library, and you find a method called left_len. Unless you read the docstring, is hard to tell what is doing



                You can as well consider putting some spacing between the lines of a same function, to improve a bit readability between the sections of it



                Keep the good work!







                share|improve this answer













                share|improve this answer



                share|improve this answer











                answered Feb 17 at 21:11









                A. Romeu

                949313




                949313






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f187315%2fpython-basis-of-a-parser-library-followup%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Python Lists

                    Aion

                    JavaScript Array Iteration Methods