将数百个xml文件转换为单个csv文件

<?xml version="1.0"?> -<case> <number>2</number> <age>49</age> <sex>F</sex> <composition>solid</composition> <echogenicity>hyperechogenicity</echogenicity> <margins>well defined</margins> <calcifications>non</calcifications> <tirads>2</tirads> <reportbacaf/> <reporteco/> -<mark> <image>1</image> <svg>[{"points": [{"x": 250, "y": 72}, {"x": 226, "y": 82}, {"x": 216, "y": 90}, {"x": 204, "y": 94}, {"x": 190, "y": 98}, {"x": 181, "y": 103}, {"x": 172, "y": 109}, {"x": 165, "y": 121}, {"x": 161, "y": 131}, {"x": 159, "y": 142}, {"x": 162, "y": 170}, {"x": 164, "y": 185}, {"x": 171, "y": 203}, {"x": 176, "y": 210}, {"x": 185, "y": 214}, {"x": 191, "y": 218}, {"x": 211, "y": 228}, {"x": 212, "y": 230}, {"x": 235, "y": 239}, {"x": 243, "y": 242}, {"x": 255, "y": 244}, {"x": 263, "y": 245}, {"x": 263, "y": 245}, {"x": 285, "y": 244}, {"x": 298, "y": 242}, {"x": 330, "y": 233}, {"x": 352, "y": 217}, {"x": 367, "y": 201}, {"x": 373, "y": 194}, {"x": 379, "y": 173}, {"x": 382, "y": 163}, {"x": 383, "y": 143}, {"x": 383, "y": 136}, {"x": 382, "y": 127}, {"x": 379, "y": 122}, {"x": 374, "y": 117}, {"x": 365, "y": 109}, {"x": 360, "y": 101}, {"x": 358, "y": 95}, {"x": 352, "y": 88}, {"x": 346, "y": 85}, {"x": 333, "y": 81}, {"x": 327, "y": 78}, {"x": 319, "y": 73}, {"x": 314, "y": 72}, {"x": 304, "y": 70}, {"x": 281, "y": 69}, {"x": 258, "y": 71}, {"x": 254, "y": 71}, {"x": 248, "y": 72}], "annotation": {}, "regionType": "freehand"}]</svg> </mark> </case>

1条回答

网友

1楼 · 发布于 2024-06-02 06:36:31

循环遍历每个XML。在每个XML上，使用像etree或lxml这样的XML解析器来解析XML的内容并将其存储为dict。然后将dict保存为JSON文件或CSV

要分析的标记列表

tags = ['number','age','sex','composition','echogenicity', 'margins', 'calcifications', 'tirads']

要解析保存为“abc.XML”的XML

with open('abc.xml', 'r') as fd:
    doc = fd.read()

使用Lxml&BeautifulSoup modules进行分析

from bs4 import BeautifulSoup as BX
soup = BX(doc, 'lxml')
mydata = {}
for tag in tags:
    value = soup.find(tag)
    if value:
        mydata[tag] = value.text
    else:
        mydata[tag] = None

检查数据

print(mydata)
#{'number': '2', 'age': '49', 'sex': 'F', 'composition': 'solid', 'echogenicity': 'hyperechogenicity', 'margins': 'well defined', 'calcifications': 'non', 'tirads': '2'}

完整代码作为函数编写

from bs4 import BeautifulSoup as BX
tags = ['number','age','sex','composition','echogenicity', 'margins', 'calcifications', 'tirads']

def parse_xml(xmlfile):
    with open(xmlfile, 'r') as fd:
        doc = fd.read()
    soup = BX(doc, 'lxml')
    mydata = {}
    for tag in tags:
        value = soup.find(tag)
        if value:
            mydata[tag] = value.text
        else:
            mydata[tag] = None
    return mydata

您可以使用此函数循环所有xml文件并解析每个xml

#lets say all your xml files are in this folder
myfiles_path = r"C:/Users/RG/Desktop/test/"

import os, pandas
all_data = {}

xmlfiles = os.listdir(myfiles_path)

for file in xmlfiles:
    file_path = os.path.join(myfiles_path, file)
    all_data[file] = parse_xml(file_path)

df = pandas.DataFrame.from_dict(all_data)
df.to_csv('output.csv')

相关问题更多 >

编程相关推荐

热门问题

热门文章