我使用beautiful soup从一堆xml文件中解析和提取一些信息,如下所示:
import os
a_lis = []
for filepath in glob(os.path.join('../data/trainingFiles/', '*.xml')):
with open(filepath) as f:
content = f.read()
results = BeautifulSoup(content, 'lxml')
#print(results)
for LabelInteractions in results.find_all("labelinteractions"):
#print(LabelInteractions)
for labelinteractions in LabelInteractions.findAll('labelinteraction'):
print(labelinteractions)
输出:
<labelinteraction precipitant="ritonavir" precipitantcode="N0000007423" type="Unspecified interaction"></labelinteraction>
<labelinteraction precipitant="gc stimulator" precipitantcode="NO MAP" type="Unspecified interaction"></labelinteraction>
....
<labelinteraction precipitant="riociguat" precipitantcode="N0000188995" type="Unspecified interaction"></labelinteraction>
<labelinteraction effect=" 25064002: Headache (finding)" precipitant="alcohol" precipitantcode="N0000007432" type="Pharmacodynamic interaction"></labelinteraction>
如何将这些xml属性转换为dataframe格式?列将如下所示:
precipitant precipitantcode type effect
可以将列存储在数组中,然后创建数据帧:
输出:
如果有所需列的列表:
然后可以对它们进行迭代并附加到字典中的数组:
完成后,您可以请求数据帧:
这是我从你的样品中得到的:
相关问题 更多 >
编程相关推荐