Finding data on XML using Python's LXML

.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;

up vote
3
down vote

favorite

Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?

Python so far:

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
 if node.tag == "basic" or node.tag == "expert":
 user = [None] * 4
 for i, child in enumerate(node):
 if child.tag == "name":
 user[0] = i
 user[1] = child.text
 if child.tag == "email":
 user[2] = i
 user[3] = child.text
 print user
 if user[3].startswith('_'):
 # do some other things with data if email begins with _ ...

Will print:

[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']

XML sample:

<?xml version="1.0"?>
<users>
 <id>11111</id>
 <checked>True</checked>
 <version>A12</mode>
 <basic>
 <name>f.bar</name>
 <email>foobar@me.com</email>
 <forename>Foo</forename>
 <surname>Bar</surname>
 </basic>
 <expert>
 <name>m.bob</name>
 <forename>Mak</forename>
 <surname>Bob</surname>
 <email>m.bob@email.com</password>
 </expert>
 <expert>
 <name>m.bab</name>
 <forename>Mak</forename>
 <surname>Bab</surname>
 <email>m.bab@email.com</password>
 </expert>
 <guru>
 <name>e.guru</name>
 <forename>Nick</forename>
 <email>nick@email.com</password>
 <surname>Gru</surname>
 </guru>
</users>

edited Jan 18 at 3:18

Jamalâ™¦

30.1k11114225

asked Jan 17 at 21:06

Ã‘hosko

161

add a commentÂ |Â

up vote
3
down vote

favorite

Python so far:

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
 if node.tag == "basic" or node.tag == "expert":
 user = [None] * 4
 for i, child in enumerate(node):
 if child.tag == "name":
 user[0] = i
 user[1] = child.text
 if child.tag == "email":
 user[2] = i
 user[3] = child.text
 print user
 if user[3].startswith('_'):
 # do some other things with data if email begins with _ ...

Will print:

[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']

XML sample:

<?xml version="1.0"?>
<users>
 <id>11111</id>
 <checked>True</checked>
 <version>A12</mode>
 <basic>
 <name>f.bar</name>
 <email>foobar@me.com</email>
 <forename>Foo</forename>
 <surname>Bar</surname>
 </basic>
 <expert>
 <name>m.bob</name>
 <forename>Mak</forename>
 <surname>Bob</surname>
 <email>m.bob@email.com</password>
 </expert>
 <expert>
 <name>m.bab</name>
 <forename>Mak</forename>
 <surname>Bab</surname>
 <email>m.bab@email.com</password>
 </expert>
 <guru>
 <name>e.guru</name>
 <forename>Nick</forename>
 <email>nick@email.com</password>
 <surname>Gru</surname>
 </guru>
</users>

edited Jan 18 at 3:18

Jamalâ™¦

30.1k11114225

asked Jan 17 at 21:06

Ã‘hosko

161

add a commentÂ |Â

up vote
3
down vote

favorite

Python so far:

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
 if node.tag == "basic" or node.tag == "expert":
 user = [None] * 4
 for i, child in enumerate(node):
 if child.tag == "name":
 user[0] = i
 user[1] = child.text
 if child.tag == "email":
 user[2] = i
 user[3] = child.text
 print user
 if user[3].startswith('_'):
 # do some other things with data if email begins with _ ...

Will print:

[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']

XML sample:

<?xml version="1.0"?>
<users>
 <id>11111</id>
 <checked>True</checked>
 <version>A12</mode>
 <basic>
 <name>f.bar</name>
 <email>foobar@me.com</email>
 <forename>Foo</forename>
 <surname>Bar</surname>
 </basic>
 <expert>
 <name>m.bob</name>
 <forename>Mak</forename>
 <surname>Bob</surname>
 <email>m.bob@email.com</password>
 </expert>
 <expert>
 <name>m.bab</name>
 <forename>Mak</forename>
 <surname>Bab</surname>
 <email>m.bab@email.com</password>
 </expert>
 <guru>
 <name>e.guru</name>
 <forename>Nick</forename>
 <email>nick@email.com</password>
 <surname>Gru</surname>
 </guru>
</users>

edited Jan 18 at 3:18

Jamalâ™¦

30.1k11114225

asked Jan 17 at 21:06

Ã‘hosko

161

Python so far:

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
 if node.tag == "basic" or node.tag == "expert":
 user = [None] * 4
 for i, child in enumerate(node):
 if child.tag == "name":
 user[0] = i
 user[1] = child.text
 if child.tag == "email":
 user[2] = i
 user[3] = child.text
 print user
 if user[3].startswith('_'):
 # do some other things with data if email begins with _ ...

Will print:

[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']

XML sample:

<?xml version="1.0"?>
<users>
 <id>11111</id>
 <checked>True</checked>
 <version>A12</mode>
 <basic>
 <name>f.bar</name>
 <email>foobar@me.com</email>
 <forename>Foo</forename>
 <surname>Bar</surname>
 </basic>
 <expert>
 <name>m.bob</name>
 <forename>Mak</forename>
 <surname>Bob</surname>
 <email>m.bob@email.com</password>
 </expert>
 <expert>
 <name>m.bab</name>
 <forename>Mak</forename>
 <surname>Bab</surname>
 <email>m.bab@email.com</password>
 </expert>
 <guru>
 <name>e.guru</name>
 <forename>Nick</forename>
 <email>nick@email.com</password>
 <surname>Gru</surname>
 </guru>
</users>

edited Jan 18 at 3:18

Jamalâ™¦

30.1k11114225

asked Jan 17 at 21:06

Ã‘hosko

161

edited Jan 18 at 3:18

Jamalâ™¦

30.1k11114225

edited Jan 18 at 3:18

Jamalâ™¦

30.1k11114225

edited Jan 18 at 3:18

Jamalâ™¦

30.1k11114225

asked Jan 17 at 21:06

Ã‘hosko

161

asked Jan 17 at 21:06

Ã‘hosko

161

asked Jan 17 at 21:06

Ã‘hosko

161

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
1
down vote

Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.

Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.

Below iterates through all <basic> and <expert> tags and retrieves their child <name> and <email> all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*).

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)

user = 

# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
 user.append([i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text]) 
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text] 
 for i in tree.xpath("//*[name()='basic' or name()='expert']")]

print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]

edited Jan 29 at 15:39

answered Jan 24 at 22:21

Parfait

46828

But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â€“Â Ã‘hosko
Jan 29 at 11:48

Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
â€“Â Parfait
Jan 29 at 15:40

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185346%2ffinding-data-on-xml-using-pythons-lxml%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.

Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)

user = 

# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
 user.append([i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text]) 
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text] 
 for i in tree.xpath("//*[name()='basic' or name()='expert']")]

print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]

edited Jan 29 at 15:39

answered Jan 24 at 22:21

Parfait

46828

But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â€“Â Ã‘hosko
Jan 29 at 11:48

Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
â€“Â Parfait
Jan 29 at 15:40

add a commentÂ |Â

up vote
1
down vote

Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.

Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)

user = 

# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
 user.append([i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text]) 
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text] 
 for i in tree.xpath("//*[name()='basic' or name()='expert']")]

print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]

edited Jan 29 at 15:39

answered Jan 24 at 22:21

Parfait

46828

But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â€“Â Ã‘hosko
Jan 29 at 11:48

Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
â€“Â Parfait
Jan 29 at 15:40

add a commentÂ |Â

up vote
1
down vote

Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.

Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)

user = 

# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
 user.append([i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text]) 
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text] 
 for i in tree.xpath("//*[name()='basic' or name()='expert']")]

print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]

edited Jan 29 at 15:39

answered Jan 24 at 22:21

Parfait

46828

Currently, you are overlooking one of the advantages in using lxml with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.

Right now, your code really follows the syntax of Python's built-in etree, without any xpath() calls that can run dynamic parsing off node names.

from lxml import etree

myXML = "data.xml"
tree = etree.parse(myXML)

user = 

# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
 user.append([i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text]) 
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]


# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"), 
 i.find("name").text, 
 i.xpath("count(email/preceding-sibling::*)"), 
 i.find("email").text] 
 for i in tree.xpath("//*[name()='basic' or name()='expert']")]

print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'], 
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'], 
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]

edited Jan 29 at 15:39

answered Jan 24 at 22:21

Parfait

46828

edited Jan 29 at 15:39

answered Jan 24 at 22:21

Parfait

46828

answered Jan 24 at 22:21

Parfait

46828

answered Jan 24 at 22:21

Parfait

46828

But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â€“Â Ã‘hosko
Jan 29 at 11:48

Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
â€“Â Parfait
Jan 29 at 15:40

add a commentÂ |Â

But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â€“Â Ã‘hosko
Jan 29 at 11:48

Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
â€“Â Parfait
Jan 29 at 15:40

But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â€“Â Ã‘hosko
Jan 29 at 11:48

Understood. See edit still using an XPath solution with count(.../preceding-sibling::*).
â€“Â Parfait
Jan 29 at 15:40

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

41g5hzdwkBzxijiK,3Kp32Ep211C,4,fVzLhdpRt4b55

搜尋此網誌

trjhtr