Finding data on XML using Python's LXML

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;







up vote
3
down vote

favorite
1












Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?



Python so far:



from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
if node.tag == "basic" or node.tag == "expert":
user = [None] * 4
for i, child in enumerate(node):
if child.tag == "name":
user[0] = i
user[1] = child.text
if child.tag == "email":
user[2] = i
user[3] = child.text
print user
if user[3].startswith('_'):
# do some other things with data if email begins with _ ...


Will print:



[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']


XML sample:





<?xml version="1.0"?>
<users>
<id>11111</id>
<checked>True</checked>
<version>A12</mode>
<basic>
<name>f.bar</name>
<email>foobar@me.com</email>
<forename>Foo</forename>
<surname>Bar</surname>
</basic>
<expert>
<name>m.bob</name>
<forename>Mak</forename>
<surname>Bob</surname>
<email>m.bob@email.com</password>
</expert>
<expert>
<name>m.bab</name>
<forename>Mak</forename>
<surname>Bab</surname>
<email>m.bab@email.com</password>
</expert>
<guru>
<name>e.guru</name>
<forename>Nick</forename>
<email>nick@email.com</password>
<surname>Gru</surname>
</guru>
</users>






share|improve this question



























    up vote
    3
    down vote

    favorite
    1












    Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?



    Python so far:



    from lxml import etree

    myXML = "data.xml"
    tree = etree.parse(myXML)
    root = tree.getroot()
    for node in root:
    if node.tag == "basic" or node.tag == "expert":
    user = [None] * 4
    for i, child in enumerate(node):
    if child.tag == "name":
    user[0] = i
    user[1] = child.text
    if child.tag == "email":
    user[2] = i
    user[3] = child.text
    print user
    if user[3].startswith('_'):
    # do some other things with data if email begins with _ ...


    Will print:



    [0, 'f.bar', 1, 'foobar@me.com']
    [0, 'm.bob', 3, 'm.bob@email.com']
    [0, 'm.bab', 3, 'm.bab@email.com']


    XML sample:





    <?xml version="1.0"?>
    <users>
    <id>11111</id>
    <checked>True</checked>
    <version>A12</mode>
    <basic>
    <name>f.bar</name>
    <email>foobar@me.com</email>
    <forename>Foo</forename>
    <surname>Bar</surname>
    </basic>
    <expert>
    <name>m.bob</name>
    <forename>Mak</forename>
    <surname>Bob</surname>
    <email>m.bob@email.com</password>
    </expert>
    <expert>
    <name>m.bab</name>
    <forename>Mak</forename>
    <surname>Bab</surname>
    <email>m.bab@email.com</password>
    </expert>
    <guru>
    <name>e.guru</name>
    <forename>Nick</forename>
    <email>nick@email.com</password>
    <surname>Gru</surname>
    </guru>
    </users>






    share|improve this question























      up vote
      3
      down vote

      favorite
      1









      up vote
      3
      down vote

      favorite
      1






      1





      Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?



      Python so far:



      from lxml import etree

      myXML = "data.xml"
      tree = etree.parse(myXML)
      root = tree.getroot()
      for node in root:
      if node.tag == "basic" or node.tag == "expert":
      user = [None] * 4
      for i, child in enumerate(node):
      if child.tag == "name":
      user[0] = i
      user[1] = child.text
      if child.tag == "email":
      user[2] = i
      user[3] = child.text
      print user
      if user[3].startswith('_'):
      # do some other things with data if email begins with _ ...


      Will print:



      [0, 'f.bar', 1, 'foobar@me.com']
      [0, 'm.bob', 3, 'm.bob@email.com']
      [0, 'm.bab', 3, 'm.bab@email.com']


      XML sample:





      <?xml version="1.0"?>
      <users>
      <id>11111</id>
      <checked>True</checked>
      <version>A12</mode>
      <basic>
      <name>f.bar</name>
      <email>foobar@me.com</email>
      <forename>Foo</forename>
      <surname>Bar</surname>
      </basic>
      <expert>
      <name>m.bob</name>
      <forename>Mak</forename>
      <surname>Bob</surname>
      <email>m.bob@email.com</password>
      </expert>
      <expert>
      <name>m.bab</name>
      <forename>Mak</forename>
      <surname>Bab</surname>
      <email>m.bab@email.com</password>
      </expert>
      <guru>
      <name>e.guru</name>
      <forename>Nick</forename>
      <email>nick@email.com</password>
      <surname>Gru</surname>
      </guru>
      </users>






      share|improve this question













      Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?



      Python so far:



      from lxml import etree

      myXML = "data.xml"
      tree = etree.parse(myXML)
      root = tree.getroot()
      for node in root:
      if node.tag == "basic" or node.tag == "expert":
      user = [None] * 4
      for i, child in enumerate(node):
      if child.tag == "name":
      user[0] = i
      user[1] = child.text
      if child.tag == "email":
      user[2] = i
      user[3] = child.text
      print user
      if user[3].startswith('_'):
      # do some other things with data if email begins with _ ...


      Will print:



      [0, 'f.bar', 1, 'foobar@me.com']
      [0, 'm.bob', 3, 'm.bob@email.com']
      [0, 'm.bab', 3, 'm.bab@email.com']


      XML sample:





      <?xml version="1.0"?>
      <users>
      <id>11111</id>
      <checked>True</checked>
      <version>A12</mode>
      <basic>
      <name>f.bar</name>
      <email>foobar@me.com</email>
      <forename>Foo</forename>
      <surname>Bar</surname>
      </basic>
      <expert>
      <name>m.bob</name>
      <forename>Mak</forename>
      <surname>Bob</surname>
      <email>m.bob@email.com</password>
      </expert>
      <expert>
      <name>m.bab</name>
      <forename>Mak</forename>
      <surname>Bab</surname>
      <email>m.bab@email.com</password>
      </expert>
      <guru>
      <name>e.guru</name>
      <forename>Nick</forename>
      <email>nick@email.com</password>
      <surname>Gru</surname>
      </guru>
      </users>








      share|improve this question












      share|improve this question




      share|improve this question








      edited Jan 18 at 3:18









      Jamal♦

      30.1k11114225




      30.1k11114225









      asked Jan 17 at 21:06









      Ñhosko

      161




      161




















          1 Answer
          1






          active

          oldest

          votes

















          up vote
          1
          down vote













          Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.



          Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.



          Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).



          from lxml import etree

          myXML = "data.xml"
          tree = etree.parse(myXML)

          user =

          # FOR LOOP
          for i in tree.xpath("//*[name()='basic' or name()='expert']"):
          user.append([i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text])
          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


          # LIST COMPREHENSION
          user = [[i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text]
          for i in tree.xpath("//*[name()='basic' or name()='expert']")]

          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]





          share|improve this answer























          • But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
            – Ñhosko
            Jan 29 at 11:48











          • Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
            – Parfait
            Jan 29 at 15:40










          Your Answer




          StackExchange.ifUsing("editor", function ()
          return StackExchange.using("mathjaxEditing", function ()
          StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
          StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
          );
          );
          , "mathjax-editing");

          StackExchange.ifUsing("editor", function ()
          StackExchange.using("externalEditor", function ()
          StackExchange.using("snippets", function ()
          StackExchange.snippets.init();
          );
          );
          , "code-snippets");

          StackExchange.ready(function()
          var channelOptions =
          tags: "".split(" "),
          id: "196"
          ;
          initTagRenderer("".split(" "), "".split(" "), channelOptions);

          StackExchange.using("externalEditor", function()
          // Have to fire editor after snippets, if snippets enabled
          if (StackExchange.settings.snippets.snippetsEnabled)
          StackExchange.using("snippets", function()
          createEditor();
          );

          else
          createEditor();

          );

          function createEditor()
          StackExchange.prepareEditor(
          heartbeatType: 'answer',
          convertImagesToLinks: false,
          noModals: false,
          showLowRepImageUploadWarning: true,
          reputationToPostImages: null,
          bindNavPrevention: true,
          postfix: "",
          onDemand: true,
          discardSelector: ".discard-answer"
          ,immediatelyShowMarkdownHelp:true
          );



          );








           

          draft saved


          draft discarded


















          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185346%2ffinding-data-on-xml-using-pythons-lxml%23new-answer', 'question_page');

          );

          Post as a guest






























          1 Answer
          1






          active

          oldest

          votes








          1 Answer
          1






          active

          oldest

          votes









          active

          oldest

          votes






          active

          oldest

          votes








          up vote
          1
          down vote













          Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.



          Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.



          Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).



          from lxml import etree

          myXML = "data.xml"
          tree = etree.parse(myXML)

          user =

          # FOR LOOP
          for i in tree.xpath("//*[name()='basic' or name()='expert']"):
          user.append([i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text])
          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


          # LIST COMPREHENSION
          user = [[i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text]
          for i in tree.xpath("//*[name()='basic' or name()='expert']")]

          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]





          share|improve this answer























          • But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
            – Ñhosko
            Jan 29 at 11:48











          • Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
            – Parfait
            Jan 29 at 15:40














          up vote
          1
          down vote













          Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.



          Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.



          Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).



          from lxml import etree

          myXML = "data.xml"
          tree = etree.parse(myXML)

          user =

          # FOR LOOP
          for i in tree.xpath("//*[name()='basic' or name()='expert']"):
          user.append([i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text])
          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


          # LIST COMPREHENSION
          user = [[i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text]
          for i in tree.xpath("//*[name()='basic' or name()='expert']")]

          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]





          share|improve this answer























          • But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
            – Ñhosko
            Jan 29 at 11:48











          • Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
            – Parfait
            Jan 29 at 15:40












          up vote
          1
          down vote










          up vote
          1
          down vote









          Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.



          Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.



          Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).



          from lxml import etree

          myXML = "data.xml"
          tree = etree.parse(myXML)

          user =

          # FOR LOOP
          for i in tree.xpath("//*[name()='basic' or name()='expert']"):
          user.append([i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text])
          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


          # LIST COMPREHENSION
          user = [[i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text]
          for i in tree.xpath("//*[name()='basic' or name()='expert']")]

          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]





          share|improve this answer















          Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.



          Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.



          Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).



          from lxml import etree

          myXML = "data.xml"
          tree = etree.parse(myXML)

          user =

          # FOR LOOP
          for i in tree.xpath("//*[name()='basic' or name()='expert']"):
          user.append([i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text])
          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


          # LIST COMPREHENSION
          user = [[i.xpath("count(name/preceding-sibling::*)"),
          i.find("name").text,
          i.xpath("count(email/preceding-sibling::*)"),
          i.find("email").text]
          for i in tree.xpath("//*[name()='basic' or name()='expert']")]

          print(user)
          # [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
          # [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
          # [0.0, 'm.bab', 3.0, 'm.bab@email.com']]






          share|improve this answer















          share|improve this answer



          share|improve this answer








          edited Jan 29 at 15:39


























          answered Jan 24 at 22:21









          Parfait

          46828




          46828











          • But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
            – Ñhosko
            Jan 29 at 11:48











          • Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
            – Parfait
            Jan 29 at 15:40
















          • But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
            – Ñhosko
            Jan 29 at 11:48











          • Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
            – Parfait
            Jan 29 at 15:40















          But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
          – Ñhosko
          Jan 29 at 11:48





          But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
          – Ñhosko
          Jan 29 at 11:48













          Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
          – Parfait
          Jan 29 at 15:40




          Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
          – Parfait
          Jan 29 at 15:40












           

          draft saved


          draft discarded


























           


          draft saved


          draft discarded














          StackExchange.ready(
          function ()
          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185346%2ffinding-data-on-xml-using-pythons-lxml%23new-answer', 'question_page');

          );

          Post as a guest













































































          Popular posts from this blog

          Chat program with C++ and SFML

          Function to Return a JSON Like Objects Using VBA Collections and Arrays

          Will my employers contract hold up in court?