将XML格式的网站完全转换为数据框架

import requests import xml.etree.ElementTree as ET headers = {'User-Agent': 'Mozilla/5.0'} r = requests.get("https://www.ifsqn.com/forum/index.php/rss/forums/4-food-safety-quality-discussion/",headers=headers) c = r.content root = ET.parse(r).getroot() print(root)

1条回答

网友

1楼 · 发布于 2024-10-01 05:01:55

要解析的XML是RSS，因为它具有特定的格式，所以可以使用解析RSS提要的python库（feedparser作为示例）

import feedparser
import pandas as pd

parsed_rss = feedparser.parse('https://www.ifsqn.com/forum/index.php/rss/forums/4-food-safety-quality-discussion/')

pd.DataFrame(parsed_rss['entries'])
                                                title                                       title_detail  ...                                                 id guidislink
0                      Monitored vs Verifying Records  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
1   Is it necessary to follow the new ISO 22000 to...  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
2                      usda inspector tagging product  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
3                              Chocolate Liquor Discs  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
4                              Multi-Pack Beef Sticks  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
..                                                ...                                                ...  ...                                                ...        ...
95  HACCP Pan for super critical fluid extraction ...  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
96               Illegal Drugs Pictured on Food Label  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
97    BRC metal can packaging compliance requirements  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
98  Codex Decision tree in ISO 22000:2018 - Clause...  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False
99           BRC clause 4.3.4 - Battery Charging area  {'type': 'text/plain', 'language': None, 'base...  ...  https://www.ifsqn.com/forum/index.php/topic/38...      False

[100 rows x 10 columns]

另一种方法是自己将XML解析为某种结构，该结构可用于构造数据帧Example here

编辑：

现在我看到您在以下行中传递了r而不是c：

root = ET.parse(r).getroot()

相关问题更多 >

编程相关推荐

热门问题

热门文章

将XML格式的网站完全转换为数据框架

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >