Finding data on XML using Python's LXML
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
3
down vote
favorite
Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?
Python so far:
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
if node.tag == "basic" or node.tag == "expert":
user = [None] * 4
for i, child in enumerate(node):
if child.tag == "name":
user[0] = i
user[1] = child.text
if child.tag == "email":
user[2] = i
user[3] = child.text
print user
if user[3].startswith('_'):
# do some other things with data if email begins with _ ...
Will print:
[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']
XML sample:
<?xml version="1.0"?>
<users>
<id>11111</id>
<checked>True</checked>
<version>A12</mode>
<basic>
<name>f.bar</name>
<email>foobar@me.com</email>
<forename>Foo</forename>
<surname>Bar</surname>
</basic>
<expert>
<name>m.bob</name>
<forename>Mak</forename>
<surname>Bob</surname>
<email>m.bob@email.com</password>
</expert>
<expert>
<name>m.bab</name>
<forename>Mak</forename>
<surname>Bab</surname>
<email>m.bab@email.com</password>
</expert>
<guru>
<name>e.guru</name>
<forename>Nick</forename>
<email>nick@email.com</password>
<surname>Gru</surname>
</guru>
</users>
python xml lxml
add a comment |Â
up vote
3
down vote
favorite
Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?
Python so far:
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
if node.tag == "basic" or node.tag == "expert":
user = [None] * 4
for i, child in enumerate(node):
if child.tag == "name":
user[0] = i
user[1] = child.text
if child.tag == "email":
user[2] = i
user[3] = child.text
print user
if user[3].startswith('_'):
# do some other things with data if email begins with _ ...
Will print:
[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']
XML sample:
<?xml version="1.0"?>
<users>
<id>11111</id>
<checked>True</checked>
<version>A12</mode>
<basic>
<name>f.bar</name>
<email>foobar@me.com</email>
<forename>Foo</forename>
<surname>Bar</surname>
</basic>
<expert>
<name>m.bob</name>
<forename>Mak</forename>
<surname>Bob</surname>
<email>m.bob@email.com</password>
</expert>
<expert>
<name>m.bab</name>
<forename>Mak</forename>
<surname>Bab</surname>
<email>m.bab@email.com</password>
</expert>
<guru>
<name>e.guru</name>
<forename>Nick</forename>
<email>nick@email.com</password>
<surname>Gru</surname>
</guru>
</users>
python xml lxml
add a comment |Â
up vote
3
down vote
favorite
up vote
3
down vote
favorite
Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?
Python so far:
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
if node.tag == "basic" or node.tag == "expert":
user = [None] * 4
for i, child in enumerate(node):
if child.tag == "name":
user[0] = i
user[1] = child.text
if child.tag == "email":
user[2] = i
user[3] = child.text
print user
if user[3].startswith('_'):
# do some other things with data if email begins with _ ...
Will print:
[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']
XML sample:
<?xml version="1.0"?>
<users>
<id>11111</id>
<checked>True</checked>
<version>A12</mode>
<basic>
<name>f.bar</name>
<email>foobar@me.com</email>
<forename>Foo</forename>
<surname>Bar</surname>
</basic>
<expert>
<name>m.bob</name>
<forename>Mak</forename>
<surname>Bob</surname>
<email>m.bob@email.com</password>
</expert>
<expert>
<name>m.bab</name>
<forename>Mak</forename>
<surname>Bab</surname>
<email>m.bab@email.com</password>
</expert>
<guru>
<name>e.guru</name>
<forename>Nick</forename>
<email>nick@email.com</password>
<surname>Gru</surname>
</guru>
</users>
python xml lxml
Using Python's LXML I must read an XML file and print from each "basic" and "expert" tag, the name and email text from it. I've done a script that works but I don't think is the best way of doing this. Is there a better (simpler) way for getting the data of the XML without having to make 2 iterations on it?
Python so far:
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
root = tree.getroot()
for node in root:
if node.tag == "basic" or node.tag == "expert":
user = [None] * 4
for i, child in enumerate(node):
if child.tag == "name":
user[0] = i
user[1] = child.text
if child.tag == "email":
user[2] = i
user[3] = child.text
print user
if user[3].startswith('_'):
# do some other things with data if email begins with _ ...
Will print:
[0, 'f.bar', 1, 'foobar@me.com']
[0, 'm.bob', 3, 'm.bob@email.com']
[0, 'm.bab', 3, 'm.bab@email.com']
XML sample:
<?xml version="1.0"?>
<users>
<id>11111</id>
<checked>True</checked>
<version>A12</mode>
<basic>
<name>f.bar</name>
<email>foobar@me.com</email>
<forename>Foo</forename>
<surname>Bar</surname>
</basic>
<expert>
<name>m.bob</name>
<forename>Mak</forename>
<surname>Bob</surname>
<email>m.bob@email.com</password>
</expert>
<expert>
<name>m.bab</name>
<forename>Mak</forename>
<surname>Bab</surname>
<email>m.bab@email.com</password>
</expert>
<guru>
<name>e.guru</name>
<forename>Nick</forename>
<email>nick@email.com</password>
<surname>Gru</surname>
</guru>
</users>
python xml lxml
edited Jan 18 at 3:18
Jamalâ¦
30.1k11114225
30.1k11114225
asked Jan 17 at 21:06
Ãhosko
161
161
add a comment |Â
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
1
down vote
Currently, you are overlooking one of the advantages in using lxml
with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.
Right now, your code really follows the syntax of Python's built-in etree
, without any xpath()
calls that can run dynamic parsing off node names.
Below iterates through all <basic>
and <expert>
tags and retrieves their child <name>
and <email>
all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*)
.
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user =
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
user.append([i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text])
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text]
for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â Ãhosko
Jan 29 at 11:48
Understood. See edit still using an XPath solution withcount(.../preceding-sibling::*)
.
â Parfait
Jan 29 at 15:40
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
1
down vote
Currently, you are overlooking one of the advantages in using lxml
with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.
Right now, your code really follows the syntax of Python's built-in etree
, without any xpath()
calls that can run dynamic parsing off node names.
Below iterates through all <basic>
and <expert>
tags and retrieves their child <name>
and <email>
all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*)
.
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user =
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
user.append([i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text])
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text]
for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â Ãhosko
Jan 29 at 11:48
Understood. See edit still using an XPath solution withcount(.../preceding-sibling::*)
.
â Parfait
Jan 29 at 15:40
add a comment |Â
up vote
1
down vote
Currently, you are overlooking one of the advantages in using lxml
with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.
Right now, your code really follows the syntax of Python's built-in etree
, without any xpath()
calls that can run dynamic parsing off node names.
Below iterates through all <basic>
and <expert>
tags and retrieves their child <name>
and <email>
all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*)
.
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user =
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
user.append([i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text])
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text]
for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â Ãhosko
Jan 29 at 11:48
Understood. See edit still using an XPath solution withcount(.../preceding-sibling::*)
.
â Parfait
Jan 29 at 15:40
add a comment |Â
up vote
1
down vote
up vote
1
down vote
Currently, you are overlooking one of the advantages in using lxml
with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.
Right now, your code really follows the syntax of Python's built-in etree
, without any xpath()
calls that can run dynamic parsing off node names.
Below iterates through all <basic>
and <expert>
tags and retrieves their child <name>
and <email>
all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*)
.
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user =
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
user.append([i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text])
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text]
for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
Currently, you are overlooking one of the advantages in using lxml
with its fully compilant W3C XPath 1.0 (even XSLT 1.0) language modules.
Right now, your code really follows the syntax of Python's built-in etree
, without any xpath()
calls that can run dynamic parsing off node names.
Below iterates through all <basic>
and <expert>
tags and retrieves their child <name>
and <email>
all in one loop or list comprehension. And to retrieve their position we count their preceding siblings with count(preceding-sibling::*)
.
from lxml import etree
myXML = "data.xml"
tree = etree.parse(myXML)
user =
# FOR LOOP
for i in tree.xpath("//*[name()='basic' or name()='expert']"):
user.append([i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text])
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
# LIST COMPREHENSION
user = [[i.xpath("count(name/preceding-sibling::*)"),
i.find("name").text,
i.xpath("count(email/preceding-sibling::*)"),
i.find("email").text]
for i in tree.xpath("//*[name()='basic' or name()='expert']")]
print(user)
# [[0.0, 'f.bar', 1.0, 'foobar@me.com'],
# [0.0, 'm.bob', 3.0, 'm.bob@email.com'],
# [0.0, 'm.bab', 3.0, 'm.bab@email.com']]
edited Jan 29 at 15:39
answered Jan 24 at 22:21
Parfait
46828
46828
But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â Ãhosko
Jan 29 at 11:48
Understood. See edit still using an XPath solution withcount(.../preceding-sibling::*)
.
â Parfait
Jan 29 at 15:40
add a comment |Â
But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â Ãhosko
Jan 29 at 11:48
Understood. See edit still using an XPath solution withcount(.../preceding-sibling::*)
.
â Parfait
Jan 29 at 15:40
But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â Ãhosko
Jan 29 at 11:48
But how to get also the position of the searched child? Look at original code.. user[0] = i and user[2] = i. As XML format is not the same for basic and expert, I need this information.
â Ãhosko
Jan 29 at 11:48
Understood. See edit still using an XPath solution with
count(.../preceding-sibling::*)
.â Parfait
Jan 29 at 15:40
Understood. See edit still using an XPath solution with
count(.../preceding-sibling::*)
.â Parfait
Jan 29 at 15:40
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f185346%2ffinding-data-on-xml-using-pythons-lxml%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password