我正试图解析出一个相当嵌套的XML文件。我花了几个小时试图找到一个解决方案,但运气不好。我不确定问题是名称空间,还是需要在循环中查找
我能够提取更高级别的元素,但是没有提取更深层次的嵌套元素。我期待出口零件号,制造商名称,名称,产品和零售到df
这里的XML示例(并非所有提交都完全一致,有些字段缺失):
<?xml version="1.0" encoding="UTF-8"?><merchandiser xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:noNamespaceSchemaLocation="merchandiser.xsd"><header><merchantId>35386</merchantId><merchantName>Rock Bottom Golf</merchantName><createdOn>10/13/2021 14:01:49</createdOn></header>
<product product_id='15' name='Champ Golf- Max Pro Spike Wrench' sku_number='19CHPSPWRCH1111111111101' manufacturer_name='Champ Golf' part_number='19CHPSPWRCH1111111111101'><category><primary>Sporting Goods</primary></category><URL><product>https://click.linksynergy.com/link?id=83wh4zNK2Zo&offerid=301124.15&type=15&murl=http%3A%2F%2Fwww.rockbottomgolf.com%2Faccessories%2Fother%2Fchamp-golf-max-pro-spike-wrench%2F%3Futm_source%3Drakuten%26utm_medium%3Dcse%26utm_term%3D19CHPSPWRCH1111111111101</product><productImage>http://d3d71ba2asa5oz.cloudfront.net/40000065/images/19chpspwrch1111111111101.jpg</productImage></URL><description><short>A convenient and easy to use tool. No more struggling with your spikes. Features: Comfortable contoured soft touch dual density handle Three position ratchet for insertion, removal or lock in place Three bits to fit any spike, all will fit in drills Stand</short><long>A convenient and easy to use tool. No more struggling with your spikes. Features: Comfortable contoured soft touch dual density handle Three position ratchet for insertion, removal or lock in place Three bits to fit any spike, all will fit in drills Stand</long></description><discount currency='USD'><type>amount</type></discount><price currency='USD'><retail>9.99</retail></price><brand>Champ Golf</brand><shipping><availability>in-stock</availability></shipping><upc>00036504884013</upc><pixel>https://ad.linksynergy.com/fs-bin/show?id=83wh4zNK2Zo&bids=301124.15&type=15&subid=0</pixel><modification>U</modification></product>
<product product_id='21' name='Stinger Tees- 3" Stinger Pro XL Competition Camo Mid Pack Poly Bag [125 Count]' sku_number='19STGTEEMID3CO1111111101' manufacturer_name='Stinger Tees' part_number='19STGTEEMID3CO1111111101'><category><primary>Sporting Goods</primary><secondary>Outdoor Recreation~~Golf</secondary></category><URL><product>https://click.linksynergy.com/link?id=83wh4zNK2Zo&offerid=301124.21&type=15&murl=http%3A%2F%2Fwww.rockbottomgolf.com%2Faccessories%2Ftees%2Fstinger-tees-3-stinger-pro-xl-competition-camo-mid-pack-poly-bag-125-count%2F%3Futm_source%3Drakuten%26utm_medium%3Dcse%26utm_term%3D19STGTEEMID3CO1111111101</product><productImage>http://d3d71ba2asa5oz.cloudfront.net/40000065/images/3%20tees%20125%20count.jpg</productImage></URL><description><short>Features: Resealable package Less resistance due to a smaller tee head Built to withstand the strongest swings High-quality 120 Tees</short><long>Features: Resealable package Less resistance due to a smaller tee head Built to withstand the strongest swings High-quality 120 Tees</long></description><discount currency='USD'><type>amount</type></discount><price currency='USD'><retail>7.99</retail></price><brand>Stinger Tees</brand><shipping><availability>in-stock</availability></shipping><upc>00853190005047</upc><pixel>https://ad.linksynergy.com/fs-bin/show?id=83wh4zNK2Zo&bids=301124.21&type=15&subid=0</pixel><modification>U</modification></product>
<product product_id='23' name='Vegas Golf- Original Game' sku_number='19VEGORIGIN1111111111101' manufacturer_name='Vegas Golf' part_number='19VEGORIGIN1111111111101'><category><primary>Sporting Goods</primary><secondary>Outdoor Recreation~~Golf</secondary></category><URL><product>https://click.linksynergy.com/link?id=83wh4zNK2Zo&offerid=301124.23&type=15&murl=http%3A%2F%2Fwww.rockbottomgolf.com%2Faccessories%2Fother%2Fvegas-golf-original-game%2F%3Futm_source%3Drakuten%26utm_medium%3Dcse%26utm_term%3D19VEGORIGIN1111111111101</product><productImage>http://d3d71ba2asa5oz.cloudfront.net/40000065/images/19vegorigin1111111111101.jpg</productImage></URL><description><short>For a limited time only, you'll get 2 bonus chips with your purchase for a total of 10 game chips! Vegas Golf: the ultimate on-the-course gambling game. Vegas Golf consists of real casino style chips, the object is to avoid the negative and obtain the pos</short><long>For a limited time only, you'll get 2 bonus chips with your purchase for a total of 10 game chips! Vegas Golf: the ultimate on-the-course gambling game. Vegas Golf consists of real casino style chips, the object is to avoid the negative and obtain the pos</long></description><discount currency='USD'><type>amount</type></discount><price currency='USD'><retail>14.99</retail></price><brand>Vegas Golf</brand><shipping><availability>in-stock</availability></shipping><upc>00689076007030</upc><pixel>https://ad.linksynergy.com/fs-bin/show?id=83wh4zNK2Zo&bids=301124.23&type=15&subid=0</pixel><modification>U</modification></product>
<product product_id='28' name='Ray Cook Golf- 12' Compact Cup Ball Retriever' sku_number='19RAYBALRET1111111111201' manufacturer_name='Ray Cook Golf' part_number='19RAYBALRET1111111111201'><category><primary>Sporting Goods</primary><secondary>Outdoor Recreation~~Golf</secondary></category><URL><product>https://click.linksynergy.com/link?id=83wh4zNK2Zo&offerid=301124.28&type=15&murl=http%3A%2F%2Fwww.rockbottomgolf.com%2Faccessories%2Fball-retrievers%2Fray-cook-golf-12-compact-cup-ball-retriever%2F%3Futm_source%3Drakuten%26utm_medium%3Dcse%26utm_term%3D19RAYBALRET1111111111201</product><productImage>http://d3d71ba2asa5oz.cloudfront.net/40000065/images/19raybalret12.jpg</productImage></URL><description><short>The Ray Cook Golf Ball Retriever extends up to 12 feet and is the perfect companion for every golf bag. Features: Durable construction Telescoping shaft design makes the retriever easy to carry</short><long>The Ray Cook Golf Ball Retriever extends up to 12 feet and is the perfect companion for every golf bag. Features: Durable construction Telescoping shaft design makes the retriever easy to carry</long></description><discount currency='USD'><type>amount</type></discount><price currency='USD'><retail>19.99</retail></price><brand>Ray Cook Golf</brand><shipping><availability>in-stock</availability></shipping><upc>00840254178410</upc><pixel>https://ad.linksynergy.com/fs-bin/show?id=83wh4zNK2Zo&bids=301124.28&type=15&subid=0</pixel><modification>U</modification></product>
我已经创建了下面的python代码,其中提取了零件号、制造商名称和名称,而其他两个则是难以捉摸的
我的代码:
import pandas as pd
import xml.etree.ElementTree as et
xtree = et.parse(r"file.xml")
xroot = xtree.getroot()
df_cols = ["part_number", "manufacturer", "name", "retail", "product"]
rows = []
for node in xroot:
part_number = node.attrib.get("part_number")
manufacturer_name = node.attrib.get("manufacturer_name")
name = node.attrib.get("name")
product = node.findall("product") if node is not None else None
retail = node.findall("retail") if node is not None else None
rows.append({"part_number": part_number, "manufacturer": manufacturer_name, "name": name, "retail": retail, "product": product,})
out_df = pd.DataFrame(rows, columns = df_cols)
out_df.head()
我当前的输出(零售、产品为空):
part_number manufacturer ... retail product
0 None None ... [] []
1 19CHPSPWRCH1111111111101 Champ Golf ... [] []
2 19STGTEEMID3CO1111111101 Stinger Tees ... [] []
3 19VEGORIGIN1111111111101 Vegas Golf ... [] []
4 19RAYBALRET1111111111201 Ray Cook Golf ... [] []
我想要的输出(为了便于阅读,缩短了URL,但在产品之后是完整的URL):
part_number manufacturer ... retail product
0 None None ... 9.99 https://click.linksynergy.com/link?id=83...
1 19CHPSPWRCH1111111111101 Champ Golf ... 7.99 https://click.linksynergy.com/link?id=83...
2 19STGTEEMID3CO1111111101 Stinger Tees ... 14.99 https://click.linksynergy.com/link?id=83...
3 19VEGORIGIN1111111111101 Vegas Golf ... 19.99 https://click.linksynergy.com/link?id=83...
4 19RAYBALRET1111111111201 Ray Cook Golf ... 6.99 https://click.linksynergy.com/link?id=83...
任何帮助都将不胜感激
假设XML结构是常量,xpath表达式以相同的顺序检索元素/属性
结果:
见下文
输出
相关问题 更多 >
编程相关推荐