我在azure应用程序服务中使用BeautifulSoup将xml转换为csv文件时遇到问题。当我在本地运行时,一切都很好。第一个区别在于汤线:
解析代码:
file_path = os.path.join(INPUT_DIRECTOR, "test.xml")
source = open(file_path, encoding="utf8")
soup = BeautifulSoup(source.read(), 'xml') #doesn't work on global server
First = [case.text for case in soup.find_all('FirstAtt')]
Second = [case.text for case in soup.find_all('SecondAtt')]
Third= [case.text for case in soup.find_all('ThirdAtt')]
results = list(zip(First, Second))
columns = ['1', '2']
df = pd.DataFrame(results, columns=columns)
df['ID'] = Third[0]
df.to_csv(OUTPUT_DIRECTORY, index=False, encoding='utf-8-sig')
XML:
<?xml version="1.0" encoding="utf-8"?>
<root>
<ThirdAtt>7290027600007</ThirdAtt>
<Items Count="279">
<Item>
<FirstAtt>2021-09-05 08:00</FirstAtt>
<SecondAtt>5411188134985</SecondAtt>
...
</Item>
<Item>
<FirstAtt>2021-09-05 08:00</FirstAtt>
<SecondAtt>5411188135005</SecondAtt>
...
</Item>
...
在本地ip运行上,soup行能够读取xml文件,但在azure soup上运行的全局服务器上,soup行无法读取该文件,并将其重新定位为:
soup = <?xml version="1.0" encoding="utf-8"?>
有没有办法解决这个问题
更新:
多亏了@balderman,我已经按照建议改变了汤的用途:
root = ET.fromstring(xml)
headers = []
for idx,item in enumerate(root.findall('.//Item')):
data = []
if idx == 0:
headers = [x.tag for x in list(item)]
for h in headers:
data.append(item.find(h).text)
First.append(data[0])
Second.append(data[1])
results = list(zip(First, Second))
...
如果数据[i]中的位置发生变化,是否有办法对附录使用通用索引
不需要任何外部库-只需使用核心python ElementTree即可
输出
相关问题 更多 >
编程相关推荐