<p>假设XML结构是常量,xpath表达式以相同的顺序检索元素/属性</p>
<pre><code>from lxml import etree
import pandas as pd
df_cols = ["part_number", "manufacturer", "name", "retail", "product"]
rows = []
tree = etree.parse('/home/luis/tmp/tmp.xml')
root = tree.getroot()
steps = tree.xpath('//product/attribute::*[name()="name" or name()="part_number" or name()="manufacturer_name"] | //product/URL/product/text() | //product/price/retail/text()')
i=0
d=dict()
for s in steps:
if i == 0:
d[df_cols[2]]=s
if i == 1:
d[df_cols[0]]=s
if i == 2:
d[df_cols[1]]=s
if i == 3:
d[df_cols[3]]=s
if i == 4:
d[df_cols[4]]=s
rows.append(d)
i=0
d=dict()
continue
i+=1
out_df = pd.DataFrame(rows, columns = df_cols)
print(out_df.head())
</code></pre>
<p>结果:</p>
<pre><code> part_number manufacturer name retail product
0 Champ Golf 19CHPSPWRCH1111111111101 Champ Golf- Max Pro Spike Wrench https://click.linksynergy.com/link?id=83wh4zNK... 9.99
1 Stinger Tees 19STGTEEMID3CO1111111101 Stinger Tees- 3" Stinger Pro XL Competition Ca... https://click.linksynergy.com/link?id=83wh4zNK... 7.99
2 Vegas Golf 19VEGORIGIN1111111111101 Vegas Golf- Original Game https://click.linksynergy.com/link?id=83wh4zNK... 14.99
3 Ray Cook Golf 19RAYBALRET1111111111201 Ray Cook Golf- 12' Compact Cup Ball Retriever https://click.linksynergy.com/link?id=83wh4zNK... 19.99
</code></pre>