Custom parser for google protobuf

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
4
down vote

favorite












I have written a module for parsing a format used by the Defold game engine, it is essentially the google protobuf format, but with a lot of custom things to support how they handle internal things.



I would like to have the whole code reviewed, but seeing that it is fairly big (800 lines) I assume that is a no go? If someone could give it a quick read through that would be great, if not maybe only the actual parsing?



The code takes a document and then makes it into a "tree", with Elements and Attributes, you can then change the values and write it down again.



I have been writing python for years but I am all self taught and have never had something code reviewed, so I imagine I have a lot to learn.



The data that is read looks like this



name: "default"
scale_along_z: 0
embedded_instances
id: "go"
data: "components n"
" id: "script"n"
" component: "/main/main.script"n"
" position n"
" x: 0.0n"
" y: 0.0n"
" z: 0.0n"
" n"
" rotation n"
" x: 0.0n"
" y: 0.0n"
" z: 0.0n"
" w: 1.0n"
" n"
"n"
"components n"
" id: "mockup"n"
" component: "/main/mockup.gui"n"
" position n"
" x: 0.0n"
" y: 0.0n"
" z: 0.0n"
" n"
" rotation n"
" x: 0.0n"
" y: 0.0n"
" z: 0.0n"
" w: 1.0n"
" n"
"n"
""
position
x: 0.0
y: 0.0
z: 0.0

rotation
x: 0.0
y: 0.0
z: 0.0
w: 1.0

scale3
x: 1.0
y: 1.0
z: 1.0




Unfortunately the code for this is huge and maybe too unreadable? The whole code is on github https://github.com/Jerakin/DefTree/blob/master/deftree/init.py



class BaseDefParser: # pragma: no cover
_pattern = ''
_regex = re_compile(_pattern)
file_path = None

def __init__(self, root_element):
self.root = root_element
self._element_chain = [self.root]

def _raise_parse_error(self):
if self.file_path:
raise ParseError("Error when parsing file: ".format(self.file_path))
raise ParseError("Error when parsing supplied document")

def parse(self, source) -> 'DefTree':
"""Loads an external Defold section into this DefTree

:param source: path to the file.
:returns Element: root Element"""
self.file_path = source
document = self._open(self.file_path)
return self._parse(document)

def from_string(self, source) -> 'DefTree':
"""Parses an Defold section from a string constant

:param source: string to parse.
:returns Element: root Element"""

return self._parse(source)

def _parse(self, input_doc):
document = input_doc
last_index = True
while last_index:
last_index = self._tree_builder(document)
if last_index:
document = document[last_index:]
return self.root

def _tree_builder(self, document):
"""Searches the document for a match and builds the tree"""
return False

@staticmethod
def _get_level(child):
element_level = -1

def _count_up(_child, count):
parent = _child.get_parent()
if not parent:
return count

return _count_up(parent, count+1)
return _count_up(child, element_level)

@staticmethod
def _open(_path):
"""Returns the documents data as a string"""

with open(_path, "r") as document:
current_document = document.read()
return current_document

@classmethod
def serialize(cls, element) -> str:
"""Returns a string of the element"""
return ""


class DefParser(BaseDefParser):
_pattern = r'(?:data:)|(?:^|s)(w+):s+(.+(?:s+".*)*)|(w*)W)'
_regex = re_compile(_pattern)

def __init__(self, root_element):
super().__init__(root_element)

def _tree_builder(self, document):
"""Searches the document for a match and builds the tree"""
regex_match = self._regex.search(document)
if not regex_match and len(document) > 25:
self._raise_parse_error()
if regex_match:
element_name = regex_match.group(3)
attribute_name, attribute_value = regex_match.group(1, 2)
element_exit = regex_match.group(4)

if element_name:
if self._element_chain:
last_element = self._element_chain[-1]
else:
self._raise_parse_error() # pragma: no cover
element = last_element.add_element(element_name)
self._element_chain.append(element)
elif attribute_name and attribute_value:
if attribute_name == "data":
attribute_value = bytes(attribute_value, "utf-8").decode("unicode_escape").replace('n"n "',
"n")[1:-1]
last_element = self._element_chain[-1]
element = last_element.add_element("data")
self._element_chain.append(element)
self._parse(attribute_value)
self._element_chain.pop()
else:
if self._element_chain:
last_element = self._element_chain[-1]
else:
self._raise_parse_error() # pragma: no cover
last_element.add_attribute(attribute_name, attribute_value)

elif element_exit:
if self._element_chain:
self._element_chain.pop()
else:
self._raise_parse_error() # pragma: no cover

return regex_match.end()
return False

@classmethod
def serialize(cls, element, internal=False):
"""Returns a string of the element"""
assert_is_element(element)

def construct_string(node):
"""Recursive function that formats the text"""
nonlocal output_string
nonlocal level
for child in node:
element_level = cls._get_level(child)
if is_element(child):
if child.name == "data" and not internal:
value = cls._escape_element(child)
output_string += ": n".format(" " * element_level, child.name, value)
else:
level += 1
output_string += " {n".format(" " * element_level, child.name)
construct_string(child)
elif is_attribute(child):
output_string += ": n".format(" " * element_level, child.name,
child.string)
if level > element_level and not child.name == "data":
level -= 1
output_string += "".format(" " * level, "n")

level = 0
output_string = ""
construct_string(element)
return output_string

@classmethod
def _escape_element(cls, ele):
def yield_attributes(element_parent):
for child in element_parent:
if is_attribute(child):
yield child
else:
yield from yield_attributes(child)
data_elements = dict()
data_elements[cls._get_level(ele)] = [ele]

for x in ele.iter_elements():
if is_element(x) and x.name == "data":
lvl = cls._get_level(x)
if lvl not in data_elements:
data_elements[lvl] =
data_elements[lvl].append(x)

while data_elements:
for x in data_elements[max(data_elements)]:
for a in yield_attributes(x):
if isinstance(a, DefTreeString) and a.string.startswith('"') and a.string.endswith('"'):
a._value = a.string.replace('"', '\"')
_root = DefTree().get_root()
attr = _root.add_attribute("data", "")
parent = x.get_parent()
index = parent.index(x)
parent.remove(x)
parent.insert(index, attr)
x._parent = None
text = cls.serialize(x, True)
_root.set_attribute("data", '""'.format(text.replace('"', '\"').replace('n', "\n")))
del data_elements[max(data_elements)]

return '""'.format(cls.serialize(ele).replace("n", '\n"n "'))






share|improve this question

























    up vote
    4
    down vote

    favorite












    I have written a module for parsing a format used by the Defold game engine, it is essentially the google protobuf format, but with a lot of custom things to support how they handle internal things.



    I would like to have the whole code reviewed, but seeing that it is fairly big (800 lines) I assume that is a no go? If someone could give it a quick read through that would be great, if not maybe only the actual parsing?



    The code takes a document and then makes it into a "tree", with Elements and Attributes, you can then change the values and write it down again.



    I have been writing python for years but I am all self taught and have never had something code reviewed, so I imagine I have a lot to learn.



    The data that is read looks like this



    name: "default"
    scale_along_z: 0
    embedded_instances
    id: "go"
    data: "components n"
    " id: "script"n"
    " component: "/main/main.script"n"
    " position n"
    " x: 0.0n"
    " y: 0.0n"
    " z: 0.0n"
    " n"
    " rotation n"
    " x: 0.0n"
    " y: 0.0n"
    " z: 0.0n"
    " w: 1.0n"
    " n"
    "n"
    "components n"
    " id: "mockup"n"
    " component: "/main/mockup.gui"n"
    " position n"
    " x: 0.0n"
    " y: 0.0n"
    " z: 0.0n"
    " n"
    " rotation n"
    " x: 0.0n"
    " y: 0.0n"
    " z: 0.0n"
    " w: 1.0n"
    " n"
    "n"
    ""
    position
    x: 0.0
    y: 0.0
    z: 0.0

    rotation
    x: 0.0
    y: 0.0
    z: 0.0
    w: 1.0

    scale3
    x: 1.0
    y: 1.0
    z: 1.0




    Unfortunately the code for this is huge and maybe too unreadable? The whole code is on github https://github.com/Jerakin/DefTree/blob/master/deftree/init.py



    class BaseDefParser: # pragma: no cover
    _pattern = ''
    _regex = re_compile(_pattern)
    file_path = None

    def __init__(self, root_element):
    self.root = root_element
    self._element_chain = [self.root]

    def _raise_parse_error(self):
    if self.file_path:
    raise ParseError("Error when parsing file: ".format(self.file_path))
    raise ParseError("Error when parsing supplied document")

    def parse(self, source) -> 'DefTree':
    """Loads an external Defold section into this DefTree

    :param source: path to the file.
    :returns Element: root Element"""
    self.file_path = source
    document = self._open(self.file_path)
    return self._parse(document)

    def from_string(self, source) -> 'DefTree':
    """Parses an Defold section from a string constant

    :param source: string to parse.
    :returns Element: root Element"""

    return self._parse(source)

    def _parse(self, input_doc):
    document = input_doc
    last_index = True
    while last_index:
    last_index = self._tree_builder(document)
    if last_index:
    document = document[last_index:]
    return self.root

    def _tree_builder(self, document):
    """Searches the document for a match and builds the tree"""
    return False

    @staticmethod
    def _get_level(child):
    element_level = -1

    def _count_up(_child, count):
    parent = _child.get_parent()
    if not parent:
    return count

    return _count_up(parent, count+1)
    return _count_up(child, element_level)

    @staticmethod
    def _open(_path):
    """Returns the documents data as a string"""

    with open(_path, "r") as document:
    current_document = document.read()
    return current_document

    @classmethod
    def serialize(cls, element) -> str:
    """Returns a string of the element"""
    return ""


    class DefParser(BaseDefParser):
    _pattern = r'(?:data:)|(?:^|s)(w+):s+(.+(?:s+".*)*)|(w*)W)'
    _regex = re_compile(_pattern)

    def __init__(self, root_element):
    super().__init__(root_element)

    def _tree_builder(self, document):
    """Searches the document for a match and builds the tree"""
    regex_match = self._regex.search(document)
    if not regex_match and len(document) > 25:
    self._raise_parse_error()
    if regex_match:
    element_name = regex_match.group(3)
    attribute_name, attribute_value = regex_match.group(1, 2)
    element_exit = regex_match.group(4)

    if element_name:
    if self._element_chain:
    last_element = self._element_chain[-1]
    else:
    self._raise_parse_error() # pragma: no cover
    element = last_element.add_element(element_name)
    self._element_chain.append(element)
    elif attribute_name and attribute_value:
    if attribute_name == "data":
    attribute_value = bytes(attribute_value, "utf-8").decode("unicode_escape").replace('n"n "',
    "n")[1:-1]
    last_element = self._element_chain[-1]
    element = last_element.add_element("data")
    self._element_chain.append(element)
    self._parse(attribute_value)
    self._element_chain.pop()
    else:
    if self._element_chain:
    last_element = self._element_chain[-1]
    else:
    self._raise_parse_error() # pragma: no cover
    last_element.add_attribute(attribute_name, attribute_value)

    elif element_exit:
    if self._element_chain:
    self._element_chain.pop()
    else:
    self._raise_parse_error() # pragma: no cover

    return regex_match.end()
    return False

    @classmethod
    def serialize(cls, element, internal=False):
    """Returns a string of the element"""
    assert_is_element(element)

    def construct_string(node):
    """Recursive function that formats the text"""
    nonlocal output_string
    nonlocal level
    for child in node:
    element_level = cls._get_level(child)
    if is_element(child):
    if child.name == "data" and not internal:
    value = cls._escape_element(child)
    output_string += ": n".format(" " * element_level, child.name, value)
    else:
    level += 1
    output_string += " {n".format(" " * element_level, child.name)
    construct_string(child)
    elif is_attribute(child):
    output_string += ": n".format(" " * element_level, child.name,
    child.string)
    if level > element_level and not child.name == "data":
    level -= 1
    output_string += "".format(" " * level, "n")

    level = 0
    output_string = ""
    construct_string(element)
    return output_string

    @classmethod
    def _escape_element(cls, ele):
    def yield_attributes(element_parent):
    for child in element_parent:
    if is_attribute(child):
    yield child
    else:
    yield from yield_attributes(child)
    data_elements = dict()
    data_elements[cls._get_level(ele)] = [ele]

    for x in ele.iter_elements():
    if is_element(x) and x.name == "data":
    lvl = cls._get_level(x)
    if lvl not in data_elements:
    data_elements[lvl] =
    data_elements[lvl].append(x)

    while data_elements:
    for x in data_elements[max(data_elements)]:
    for a in yield_attributes(x):
    if isinstance(a, DefTreeString) and a.string.startswith('"') and a.string.endswith('"'):
    a._value = a.string.replace('"', '\"')
    _root = DefTree().get_root()
    attr = _root.add_attribute("data", "")
    parent = x.get_parent()
    index = parent.index(x)
    parent.remove(x)
    parent.insert(index, attr)
    x._parent = None
    text = cls.serialize(x, True)
    _root.set_attribute("data", '""'.format(text.replace('"', '\"').replace('n', "\n")))
    del data_elements[max(data_elements)]

    return '""'.format(cls.serialize(ele).replace("n", '\n"n "'))






    share|improve this question





















      up vote
      4
      down vote

      favorite









      up vote
      4
      down vote

      favorite











      I have written a module for parsing a format used by the Defold game engine, it is essentially the google protobuf format, but with a lot of custom things to support how they handle internal things.



      I would like to have the whole code reviewed, but seeing that it is fairly big (800 lines) I assume that is a no go? If someone could give it a quick read through that would be great, if not maybe only the actual parsing?



      The code takes a document and then makes it into a "tree", with Elements and Attributes, you can then change the values and write it down again.



      I have been writing python for years but I am all self taught and have never had something code reviewed, so I imagine I have a lot to learn.



      The data that is read looks like this



      name: "default"
      scale_along_z: 0
      embedded_instances
      id: "go"
      data: "components n"
      " id: "script"n"
      " component: "/main/main.script"n"
      " position n"
      " x: 0.0n"
      " y: 0.0n"
      " z: 0.0n"
      " n"
      " rotation n"
      " x: 0.0n"
      " y: 0.0n"
      " z: 0.0n"
      " w: 1.0n"
      " n"
      "n"
      "components n"
      " id: "mockup"n"
      " component: "/main/mockup.gui"n"
      " position n"
      " x: 0.0n"
      " y: 0.0n"
      " z: 0.0n"
      " n"
      " rotation n"
      " x: 0.0n"
      " y: 0.0n"
      " z: 0.0n"
      " w: 1.0n"
      " n"
      "n"
      ""
      position
      x: 0.0
      y: 0.0
      z: 0.0

      rotation
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0

      scale3
      x: 1.0
      y: 1.0
      z: 1.0




      Unfortunately the code for this is huge and maybe too unreadable? The whole code is on github https://github.com/Jerakin/DefTree/blob/master/deftree/init.py



      class BaseDefParser: # pragma: no cover
      _pattern = ''
      _regex = re_compile(_pattern)
      file_path = None

      def __init__(self, root_element):
      self.root = root_element
      self._element_chain = [self.root]

      def _raise_parse_error(self):
      if self.file_path:
      raise ParseError("Error when parsing file: ".format(self.file_path))
      raise ParseError("Error when parsing supplied document")

      def parse(self, source) -> 'DefTree':
      """Loads an external Defold section into this DefTree

      :param source: path to the file.
      :returns Element: root Element"""
      self.file_path = source
      document = self._open(self.file_path)
      return self._parse(document)

      def from_string(self, source) -> 'DefTree':
      """Parses an Defold section from a string constant

      :param source: string to parse.
      :returns Element: root Element"""

      return self._parse(source)

      def _parse(self, input_doc):
      document = input_doc
      last_index = True
      while last_index:
      last_index = self._tree_builder(document)
      if last_index:
      document = document[last_index:]
      return self.root

      def _tree_builder(self, document):
      """Searches the document for a match and builds the tree"""
      return False

      @staticmethod
      def _get_level(child):
      element_level = -1

      def _count_up(_child, count):
      parent = _child.get_parent()
      if not parent:
      return count

      return _count_up(parent, count+1)
      return _count_up(child, element_level)

      @staticmethod
      def _open(_path):
      """Returns the documents data as a string"""

      with open(_path, "r") as document:
      current_document = document.read()
      return current_document

      @classmethod
      def serialize(cls, element) -> str:
      """Returns a string of the element"""
      return ""


      class DefParser(BaseDefParser):
      _pattern = r'(?:data:)|(?:^|s)(w+):s+(.+(?:s+".*)*)|(w*)W)'
      _regex = re_compile(_pattern)

      def __init__(self, root_element):
      super().__init__(root_element)

      def _tree_builder(self, document):
      """Searches the document for a match and builds the tree"""
      regex_match = self._regex.search(document)
      if not regex_match and len(document) > 25:
      self._raise_parse_error()
      if regex_match:
      element_name = regex_match.group(3)
      attribute_name, attribute_value = regex_match.group(1, 2)
      element_exit = regex_match.group(4)

      if element_name:
      if self._element_chain:
      last_element = self._element_chain[-1]
      else:
      self._raise_parse_error() # pragma: no cover
      element = last_element.add_element(element_name)
      self._element_chain.append(element)
      elif attribute_name and attribute_value:
      if attribute_name == "data":
      attribute_value = bytes(attribute_value, "utf-8").decode("unicode_escape").replace('n"n "',
      "n")[1:-1]
      last_element = self._element_chain[-1]
      element = last_element.add_element("data")
      self._element_chain.append(element)
      self._parse(attribute_value)
      self._element_chain.pop()
      else:
      if self._element_chain:
      last_element = self._element_chain[-1]
      else:
      self._raise_parse_error() # pragma: no cover
      last_element.add_attribute(attribute_name, attribute_value)

      elif element_exit:
      if self._element_chain:
      self._element_chain.pop()
      else:
      self._raise_parse_error() # pragma: no cover

      return regex_match.end()
      return False

      @classmethod
      def serialize(cls, element, internal=False):
      """Returns a string of the element"""
      assert_is_element(element)

      def construct_string(node):
      """Recursive function that formats the text"""
      nonlocal output_string
      nonlocal level
      for child in node:
      element_level = cls._get_level(child)
      if is_element(child):
      if child.name == "data" and not internal:
      value = cls._escape_element(child)
      output_string += ": n".format(" " * element_level, child.name, value)
      else:
      level += 1
      output_string += " {n".format(" " * element_level, child.name)
      construct_string(child)
      elif is_attribute(child):
      output_string += ": n".format(" " * element_level, child.name,
      child.string)
      if level > element_level and not child.name == "data":
      level -= 1
      output_string += "".format(" " * level, "n")

      level = 0
      output_string = ""
      construct_string(element)
      return output_string

      @classmethod
      def _escape_element(cls, ele):
      def yield_attributes(element_parent):
      for child in element_parent:
      if is_attribute(child):
      yield child
      else:
      yield from yield_attributes(child)
      data_elements = dict()
      data_elements[cls._get_level(ele)] = [ele]

      for x in ele.iter_elements():
      if is_element(x) and x.name == "data":
      lvl = cls._get_level(x)
      if lvl not in data_elements:
      data_elements[lvl] =
      data_elements[lvl].append(x)

      while data_elements:
      for x in data_elements[max(data_elements)]:
      for a in yield_attributes(x):
      if isinstance(a, DefTreeString) and a.string.startswith('"') and a.string.endswith('"'):
      a._value = a.string.replace('"', '\"')
      _root = DefTree().get_root()
      attr = _root.add_attribute("data", "")
      parent = x.get_parent()
      index = parent.index(x)
      parent.remove(x)
      parent.insert(index, attr)
      x._parent = None
      text = cls.serialize(x, True)
      _root.set_attribute("data", '""'.format(text.replace('"', '\"').replace('n', "\n")))
      del data_elements[max(data_elements)]

      return '""'.format(cls.serialize(ele).replace("n", '\n"n "'))






      share|improve this question











      I have written a module for parsing a format used by the Defold game engine, it is essentially the google protobuf format, but with a lot of custom things to support how they handle internal things.



      I would like to have the whole code reviewed, but seeing that it is fairly big (800 lines) I assume that is a no go? If someone could give it a quick read through that would be great, if not maybe only the actual parsing?



      The code takes a document and then makes it into a "tree", with Elements and Attributes, you can then change the values and write it down again.



      I have been writing python for years but I am all self taught and have never had something code reviewed, so I imagine I have a lot to learn.



      The data that is read looks like this



      name: "default"
      scale_along_z: 0
      embedded_instances
      id: "go"
      data: "components n"
      " id: "script"n"
      " component: "/main/main.script"n"
      " position n"
      " x: 0.0n"
      " y: 0.0n"
      " z: 0.0n"
      " n"
      " rotation n"
      " x: 0.0n"
      " y: 0.0n"
      " z: 0.0n"
      " w: 1.0n"
      " n"
      "n"
      "components n"
      " id: "mockup"n"
      " component: "/main/mockup.gui"n"
      " position n"
      " x: 0.0n"
      " y: 0.0n"
      " z: 0.0n"
      " n"
      " rotation n"
      " x: 0.0n"
      " y: 0.0n"
      " z: 0.0n"
      " w: 1.0n"
      " n"
      "n"
      ""
      position
      x: 0.0
      y: 0.0
      z: 0.0

      rotation
      x: 0.0
      y: 0.0
      z: 0.0
      w: 1.0

      scale3
      x: 1.0
      y: 1.0
      z: 1.0




      Unfortunately the code for this is huge and maybe too unreadable? The whole code is on github https://github.com/Jerakin/DefTree/blob/master/deftree/init.py



      class BaseDefParser: # pragma: no cover
      _pattern = ''
      _regex = re_compile(_pattern)
      file_path = None

      def __init__(self, root_element):
      self.root = root_element
      self._element_chain = [self.root]

      def _raise_parse_error(self):
      if self.file_path:
      raise ParseError("Error when parsing file: ".format(self.file_path))
      raise ParseError("Error when parsing supplied document")

      def parse(self, source) -> 'DefTree':
      """Loads an external Defold section into this DefTree

      :param source: path to the file.
      :returns Element: root Element"""
      self.file_path = source
      document = self._open(self.file_path)
      return self._parse(document)

      def from_string(self, source) -> 'DefTree':
      """Parses an Defold section from a string constant

      :param source: string to parse.
      :returns Element: root Element"""

      return self._parse(source)

      def _parse(self, input_doc):
      document = input_doc
      last_index = True
      while last_index:
      last_index = self._tree_builder(document)
      if last_index:
      document = document[last_index:]
      return self.root

      def _tree_builder(self, document):
      """Searches the document for a match and builds the tree"""
      return False

      @staticmethod
      def _get_level(child):
      element_level = -1

      def _count_up(_child, count):
      parent = _child.get_parent()
      if not parent:
      return count

      return _count_up(parent, count+1)
      return _count_up(child, element_level)

      @staticmethod
      def _open(_path):
      """Returns the documents data as a string"""

      with open(_path, "r") as document:
      current_document = document.read()
      return current_document

      @classmethod
      def serialize(cls, element) -> str:
      """Returns a string of the element"""
      return ""


      class DefParser(BaseDefParser):
      _pattern = r'(?:data:)|(?:^|s)(w+):s+(.+(?:s+".*)*)|(w*)W)'
      _regex = re_compile(_pattern)

      def __init__(self, root_element):
      super().__init__(root_element)

      def _tree_builder(self, document):
      """Searches the document for a match and builds the tree"""
      regex_match = self._regex.search(document)
      if not regex_match and len(document) > 25:
      self._raise_parse_error()
      if regex_match:
      element_name = regex_match.group(3)
      attribute_name, attribute_value = regex_match.group(1, 2)
      element_exit = regex_match.group(4)

      if element_name:
      if self._element_chain:
      last_element = self._element_chain[-1]
      else:
      self._raise_parse_error() # pragma: no cover
      element = last_element.add_element(element_name)
      self._element_chain.append(element)
      elif attribute_name and attribute_value:
      if attribute_name == "data":
      attribute_value = bytes(attribute_value, "utf-8").decode("unicode_escape").replace('n"n "',
      "n")[1:-1]
      last_element = self._element_chain[-1]
      element = last_element.add_element("data")
      self._element_chain.append(element)
      self._parse(attribute_value)
      self._element_chain.pop()
      else:
      if self._element_chain:
      last_element = self._element_chain[-1]
      else:
      self._raise_parse_error() # pragma: no cover
      last_element.add_attribute(attribute_name, attribute_value)

      elif element_exit:
      if self._element_chain:
      self._element_chain.pop()
      else:
      self._raise_parse_error() # pragma: no cover

      return regex_match.end()
      return False

      @classmethod
      def serialize(cls, element, internal=False):
      """Returns a string of the element"""
      assert_is_element(element)

      def construct_string(node):
      """Recursive function that formats the text"""
      nonlocal output_string
      nonlocal level
      for child in node:
      element_level = cls._get_level(child)
      if is_element(child):
      if child.name == "data" and not internal:
      value = cls._escape_element(child)
      output_string += ": n".format(" " * element_level, child.name, value)
      else:
      level += 1
      output_string += " {n".format(" " * element_level, child.name)
      construct_string(child)
      elif is_attribute(child):
      output_string += ": n".format(" " * element_level, child.name,
      child.string)
      if level > element_level and not child.name == "data":
      level -= 1
      output_string += "".format(" " * level, "n")

      level = 0
      output_string = ""
      construct_string(element)
      return output_string

      @classmethod
      def _escape_element(cls, ele):
      def yield_attributes(element_parent):
      for child in element_parent:
      if is_attribute(child):
      yield child
      else:
      yield from yield_attributes(child)
      data_elements = dict()
      data_elements[cls._get_level(ele)] = [ele]

      for x in ele.iter_elements():
      if is_element(x) and x.name == "data":
      lvl = cls._get_level(x)
      if lvl not in data_elements:
      data_elements[lvl] =
      data_elements[lvl].append(x)

      while data_elements:
      for x in data_elements[max(data_elements)]:
      for a in yield_attributes(x):
      if isinstance(a, DefTreeString) and a.string.startswith('"') and a.string.endswith('"'):
      a._value = a.string.replace('"', '\"')
      _root = DefTree().get_root()
      attr = _root.add_attribute("data", "")
      parent = x.get_parent()
      index = parent.index(x)
      parent.remove(x)
      parent.insert(index, attr)
      x._parent = None
      text = cls.serialize(x, True)
      _root.set_attribute("data", '""'.format(text.replace('"', '\"').replace('n', "\n")))
      del data_elements[max(data_elements)]

      return '""'.format(cls.serialize(ele).replace("n", '\n"n "'))








      share|improve this question










      share|improve this question




      share|improve this question









      asked Apr 19 at 9:19









      Jerakin

      235




      235




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote



          accepted










          I definitely wouldn't say that's unreadable ! But there's some room for improvement, mostly style I think. Here's what I came up with, in no particular order:



          Leading underscores in names



          Leading underscores are supposed to be used to indicate that the object is "private", although that has no real meaning in Python: it's just a convention that says "beware if you use this, it might change its name or its implementation and is not part of the public API".



          -> You should remove them at least for method argument; they don't add any meaning here and that does not help readability.



          source



          Recursion with nonlocal variables



          In construct_string, you use variables from outside the function's scope for recursion purposes. I would advise against this, it's not very Pythonic, and it's generally better to avoid side effects that make the code harder to understand.



          -> Just pass them as arguments to the recursive calls.



          Class/instance attribute confusion



          In BaseDefParser, you declare file_path as a class attribute and manipulate it like an instance attribute in your methods. That can cause a lot of confusion especially with mutable objects as changes will be reflected upon all instances of the class.



          -> Declare instance attributes in __init__.



          _raise_parse_error



          Having factorized this is a good idea, but you lost something on the way: you don't pass any context to this method that could help understand where the error is (except the document, but that's too broad).



          -> Add context to your exceptions, you (or your future users and contributors) will thank yourself later for it !



          Complexity



          Your code is sparsely commented and not really easy to understand, you could sprinkle some comments there and there.



          You could also split some of the more complex sections into smaller methods.




          I'll update if I think of something else but that should be a good starting point. Anyway your code is pretty good, keep up if you're self taught !






          share|improve this answer























            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ifUsing("editor", function ()
            StackExchange.using("externalEditor", function ()
            StackExchange.using("snippets", function ()
            StackExchange.snippets.init();
            );
            );
            , "code-snippets");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "196"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: false,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );








             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192449%2fcustom-parser-for-google-protobuf%23new-answer', 'question_page');

            );

            Post as a guest






























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            1
            down vote



            accepted










            I definitely wouldn't say that's unreadable ! But there's some room for improvement, mostly style I think. Here's what I came up with, in no particular order:



            Leading underscores in names



            Leading underscores are supposed to be used to indicate that the object is "private", although that has no real meaning in Python: it's just a convention that says "beware if you use this, it might change its name or its implementation and is not part of the public API".



            -> You should remove them at least for method argument; they don't add any meaning here and that does not help readability.



            source



            Recursion with nonlocal variables



            In construct_string, you use variables from outside the function's scope for recursion purposes. I would advise against this, it's not very Pythonic, and it's generally better to avoid side effects that make the code harder to understand.



            -> Just pass them as arguments to the recursive calls.



            Class/instance attribute confusion



            In BaseDefParser, you declare file_path as a class attribute and manipulate it like an instance attribute in your methods. That can cause a lot of confusion especially with mutable objects as changes will be reflected upon all instances of the class.



            -> Declare instance attributes in __init__.



            _raise_parse_error



            Having factorized this is a good idea, but you lost something on the way: you don't pass any context to this method that could help understand where the error is (except the document, but that's too broad).



            -> Add context to your exceptions, you (or your future users and contributors) will thank yourself later for it !



            Complexity



            Your code is sparsely commented and not really easy to understand, you could sprinkle some comments there and there.



            You could also split some of the more complex sections into smaller methods.




            I'll update if I think of something else but that should be a good starting point. Anyway your code is pretty good, keep up if you're self taught !






            share|improve this answer



























              up vote
              1
              down vote



              accepted










              I definitely wouldn't say that's unreadable ! But there's some room for improvement, mostly style I think. Here's what I came up with, in no particular order:



              Leading underscores in names



              Leading underscores are supposed to be used to indicate that the object is "private", although that has no real meaning in Python: it's just a convention that says "beware if you use this, it might change its name or its implementation and is not part of the public API".



              -> You should remove them at least for method argument; they don't add any meaning here and that does not help readability.



              source



              Recursion with nonlocal variables



              In construct_string, you use variables from outside the function's scope for recursion purposes. I would advise against this, it's not very Pythonic, and it's generally better to avoid side effects that make the code harder to understand.



              -> Just pass them as arguments to the recursive calls.



              Class/instance attribute confusion



              In BaseDefParser, you declare file_path as a class attribute and manipulate it like an instance attribute in your methods. That can cause a lot of confusion especially with mutable objects as changes will be reflected upon all instances of the class.



              -> Declare instance attributes in __init__.



              _raise_parse_error



              Having factorized this is a good idea, but you lost something on the way: you don't pass any context to this method that could help understand where the error is (except the document, but that's too broad).



              -> Add context to your exceptions, you (or your future users and contributors) will thank yourself later for it !



              Complexity



              Your code is sparsely commented and not really easy to understand, you could sprinkle some comments there and there.



              You could also split some of the more complex sections into smaller methods.




              I'll update if I think of something else but that should be a good starting point. Anyway your code is pretty good, keep up if you're self taught !






              share|improve this answer

























                up vote
                1
                down vote



                accepted







                up vote
                1
                down vote



                accepted






                I definitely wouldn't say that's unreadable ! But there's some room for improvement, mostly style I think. Here's what I came up with, in no particular order:



                Leading underscores in names



                Leading underscores are supposed to be used to indicate that the object is "private", although that has no real meaning in Python: it's just a convention that says "beware if you use this, it might change its name or its implementation and is not part of the public API".



                -> You should remove them at least for method argument; they don't add any meaning here and that does not help readability.



                source



                Recursion with nonlocal variables



                In construct_string, you use variables from outside the function's scope for recursion purposes. I would advise against this, it's not very Pythonic, and it's generally better to avoid side effects that make the code harder to understand.



                -> Just pass them as arguments to the recursive calls.



                Class/instance attribute confusion



                In BaseDefParser, you declare file_path as a class attribute and manipulate it like an instance attribute in your methods. That can cause a lot of confusion especially with mutable objects as changes will be reflected upon all instances of the class.



                -> Declare instance attributes in __init__.



                _raise_parse_error



                Having factorized this is a good idea, but you lost something on the way: you don't pass any context to this method that could help understand where the error is (except the document, but that's too broad).



                -> Add context to your exceptions, you (or your future users and contributors) will thank yourself later for it !



                Complexity



                Your code is sparsely commented and not really easy to understand, you could sprinkle some comments there and there.



                You could also split some of the more complex sections into smaller methods.




                I'll update if I think of something else but that should be a good starting point. Anyway your code is pretty good, keep up if you're self taught !






                share|improve this answer















                I definitely wouldn't say that's unreadable ! But there's some room for improvement, mostly style I think. Here's what I came up with, in no particular order:



                Leading underscores in names



                Leading underscores are supposed to be used to indicate that the object is "private", although that has no real meaning in Python: it's just a convention that says "beware if you use this, it might change its name or its implementation and is not part of the public API".



                -> You should remove them at least for method argument; they don't add any meaning here and that does not help readability.



                source



                Recursion with nonlocal variables



                In construct_string, you use variables from outside the function's scope for recursion purposes. I would advise against this, it's not very Pythonic, and it's generally better to avoid side effects that make the code harder to understand.



                -> Just pass them as arguments to the recursive calls.



                Class/instance attribute confusion



                In BaseDefParser, you declare file_path as a class attribute and manipulate it like an instance attribute in your methods. That can cause a lot of confusion especially with mutable objects as changes will be reflected upon all instances of the class.



                -> Declare instance attributes in __init__.



                _raise_parse_error



                Having factorized this is a good idea, but you lost something on the way: you don't pass any context to this method that could help understand where the error is (except the document, but that's too broad).



                -> Add context to your exceptions, you (or your future users and contributors) will thank yourself later for it !



                Complexity



                Your code is sparsely commented and not really easy to understand, you could sprinkle some comments there and there.



                You could also split some of the more complex sections into smaller methods.




                I'll update if I think of something else but that should be a good starting point. Anyway your code is pretty good, keep up if you're self taught !







                share|improve this answer















                share|improve this answer



                share|improve this answer








                edited Apr 19 at 15:36


























                answered Apr 19 at 15:23









                etene

                2916




                2916






















                     

                    draft saved


                    draft discarded


























                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f192449%2fcustom-parser-for-google-protobuf%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    Python Lists

                    Aion

                    JavaScript Array Iteration Methods