Python使用BeautifulSoup将xml转换为csv在azure上不起作用

2024-10-01 00:24:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我在azure应用程序服务中使用BeautifulSoup将xml转换为csv文件时遇到问题。当我在本地运行时,一切都很好。第一个区别在于汤线:

解析代码:

file_path = os.path.join(INPUT_DIRECTOR, "test.xml")
source = open(file_path, encoding="utf8")
soup = BeautifulSoup(source.read(), 'xml')    #doesn't work on global server
First = [case.text for case in soup.find_all('FirstAtt')]
Second = [case.text for case in soup.find_all('SecondAtt')]
Third= [case.text for case in soup.find_all('ThirdAtt')]
results = list(zip(First, Second))
columns = ['1', '2']
df = pd.DataFrame(results, columns=columns)
df['ID'] = Third[0]
df.to_csv(OUTPUT_DIRECTORY, index=False, encoding='utf-8-sig')

XML:

<?xml version="1.0" encoding="utf-8"?>
<root>
  <ThirdAtt>7290027600007</ThirdAtt>
  <Items Count="279">
    <Item>
      <FirstAtt>2021-09-05 08:00</FirstAtt>
      <SecondAtt>5411188134985</SecondAtt>
      ...
    </Item>
    <Item>
      <FirstAtt>2021-09-05 08:00</FirstAtt>
      <SecondAtt>5411188135005</SecondAtt>
      ...
    </Item>
    ...

在本地ip运行上,soup行能够读取xml文件,但在azure soup上运行的全局服务器上,soup行无法读取该文件,并将其重新定位为:

soup = <?xml version="1.0" encoding="utf-8"?>

有没有办法解决这个问题

更新

多亏了@balderman,我已经按照建议改变了汤的用途:

root = ET.fromstring(xml)
headers = []
for idx,item in enumerate(root.findall('.//Item')):
    data = []
    if idx == 0:
        headers = [x.tag for x in list(item)]
    for h in headers:
        data.append(item.find(h).text)

    First.append(data[0])
    Second.append(data[1])
    results = list(zip(First, Second))
    ...

如果数据[i]中的位置发生变化,是否有办法对附录使用通用索引


Tags: 文件textinfordataxmlfinditem
1条回答
网友
1楼 · 发布于 2024-10-01 00:24:29

不需要任何外部库-只需使用核心python ElementTree即可

import xml.etree.ElementTree as ET
xml = '''<root>
  <ThirdAtt>7290027600007</ThirdAtt>
  <Items Count="279">
    <Item>
      <FirstAtt>2021-09-05 08:00</FirstAtt>
      <SecondAtt>5411188134985</SecondAtt>
    </Item>
    <Item>
      <FirstAtt>2021-09-05 08:00</FirstAtt>
      <SecondAtt>5411188135005</SecondAtt>
    </Item>
  </Items>
</root>
'''
root = ET.fromstring(xml)
headers = []
for idx,item in enumerate(root.findall('.//Item')):
    data = []
    if idx == 0:
        headers = [x.tag for x in list(item)]
        print(','.join(headers))
    for h in headers:
        data.append(item.find(h).text)
    print(','.join(data))

输出

FirstAtt,SecondAtt
2021-09-05 08:00,5411188134985
2021-09-05 08:00,5411188135005

相关问题 更多 >