使用Python将XML大量嵌套到CSV

2024-10-01 17:36:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的XML文件,我想用Python将其转换为CSV。在

<?xml version="1.0" encoding="UTF-8"?><households xmlns:s="http://www.mediametrie.fr/nge/  " xmlns:xalan="http://xml.apache.org/xalan" date="2015-04-06" creation_date="2015-04-08T03:48:34">
    <household id="10003456">
        <destinations/>
        <members>
            <member id="1">
                <member_process result="KO" vacation="undefined">
                    <individual_audience>
                        <individual_audience_tvset id="1">
                            <channel session="5647128" begin="56435" end="76896"/>
                        </individual_audience_tvset>
                    </individual_audience>
                    <alarms>
                        <alarm id="Alarm_id_1" rule_id="Rule_id_1">
                            <parameters>
                                <parameter name="tvset_id" value="1"/>
                                <parameter name="length" value="46384"/>
                                <parameter name="end" value="2017-04-06T20:30:00"/>
                                <parameter name="channel" value="1010128"/>
                            </parameters>
                        </alarm>
                    </alarms>
                </member_process>
            </member>
            <member id="2">
                <member_process result="KO" vacation="undefined">
                    <individual_audience>
                        <individual_audience_tvset id="1">
                            <channel session="5674897" begin="98765" end="76543"/>
                        </individual_audience_tvset>
                    </individual_audience>
                    <alarms>
                        <alarm id="Alarm_id_2" rule_id="Rule_id_2">
                            <parameters>
                                <parameter name="tvset_id" value="1"/>
                                <parameter name="length" value="56745"/>
                                <parameter name="end" value="2017-04-06T20:30:00"/>
                                <parameter name="channel" value="4563256"/>
                            </parameters>
                        </alarm>
                    </alarms>
                </member_process>
            </member>
            <member id="3">
                <member_process result="KO" vacation="undefined">
                    <individual_audience>
                        <individual_audience_tvset id="1">
                            <channel session="1010128" begin="47218" end="93600"/>
                        </individual_audience_tvset>
                    </individual_audience>
                    <alarms>
                        <alarm id="AL_R_INDP_AUDIENCE_TOO_HIGH_LIMIT" rule_id="R_INDP_AUDIENCE_TOO_HIGH_LIMIT">
                            <parameters>
                                <parameter name="tvset_id" value="1"/>
                                <parameter name="length" value="46382"/>
                                <parameter name="end" value="2015-04-06T20:30:00"/>
                                <parameter name="channel" value="1010128"/>
                            </parameters>
                        </alarm>
                    </alarms>
                </member_process>
            </member>
            <member id="4">
                <member_process result="KO" vacation="undefined">
                    <individual_audience>
                        <individual_audience_tvset id="1">
                            <channel session="1010128" begin="47219" end="93600"/>
                        </individual_audience_tvset>
                    </individual_audience>
                    <alarms>
                        <alarm id="AL_R_INDP_AUDIENCE_TOO_HIGH_LIMIT" rule_id="R_INDP_AUDIENCE_TOO_HIGH_LIMIT">
                            <parameters>
                                <parameter name="tvset_id" value="1"/>
                                <parameter name="length" value="46381"/>
                                <parameter name="end" value="2015-04-06T20:30:00"/>
                                <parameter name="channel" value="1010128"/>
                            </parameters>
                        </alarm>
                    </alarms>
                </member_process>
            </member>
            <member id="5">
                <member_process result="KO" vacation="undefined">
                    <individual_audience>
                        <individual_audience_tvset id="1">
                            <channel session="1010128" begin="47220" end="93600"/>
                        </individual_audience_tvset>
                    </individual_audience>
                    <alarms>
                        <alarm id="AL_R_INDP_AUDIENCE_TOO_HIGH_LIMIT" rule_id="R_INDP_AUDIENCE_TOO_HIGH_LIMIT">
                            <parameters>
                                <parameter name="tvset_id" value="1"/>
                                <parameter name="length" value="46380"/>
                                <parameter name="end" value="2015-04-06T20:30:00"/>
                                <parameter name="channel" value="1010128"/>
                            </parameters>
                        </alarm>
                    </alarms>
                </member_process>
            </member>
            <member id="6">
                <member_process result="KO" vacation="undefined">
                    <individual_audience>
                        <individual_audience_tvset id="1">
                            <channel session="1010128" begin="47221" end="93600"/>
                        </individual_audience_tvset>
                    </individual_audience>
                    <alarms>
                        <alarm id="AL_R_INDP_AUDIENCE_TOO_HIGH_LIMIT" rule_id="R_INDP_AUDIENCE_TOO_HIGH_LIMIT">
                            <parameters>
                                <parameter name="tvset_id" value="1"/>
                                <parameter name="length" value="46379"/>
                                <parameter name="end" value="2015-04-06T20:30:00"/>
                                <parameter name="channel" value="1010128"/>
                            </parameters>
                        </alarm>
                    </alarms>
                </member_process>
            </member>
        </members>
        <regular_guests/>
        <occasional_guests/>
        <tvsets>
            <tvset id="1">
                <tvset_process result="OK">
                    <tvset_audience>
                        <channel session="47" begin="46304" end="46384"/>
                        <channel session="1010483" begin="46384" end="46419"/>
                        <channel session="235" begin="46419" end="46424"/>
                        <channel session="1010128" begin="46424" end="93600"/>
                    </tvset_audience>
                    <alarms>
                        <alarm id="AL_T_P_VALID_LAST_HOUR_REBOOT" rule_id="T_P_METER_STOPPING_TIMESTAMPING">
                            <parameters>
                                <parameter name="unique_id" value="4547"/>
                                <parameter name="reboot_date" value="2015-04-06T07:17:44"/>
                                <parameter name="length" value="1.6221180555555557"/>
                            </parameters>
                        </alarm>
                        <alarm id="AL_T_P_VALID_LAST_HOUR_REBOOT" rule_id="T_P_METER_STOPPING_TIMESTAMPING">
                            <parameters>
                                <parameter name="unique_id" value="4566"/>
                                <parameter name="reboot_date" value="2015-04-07T13:17:54"/>
                                <parameter name="length" value="1.2313657407407406"/>
                            </parameters>
                        </alarm>
                        <alarm id="AL_T_P_TECH_ID_RESOL_FALSE_POSITIVE" rule_id="T_P_TECH_ID_RESOL">
                            <parameters>
                                <parameter name="channel_id" value="194"/>
                                <parameter name="unique_id" value="4549"/>
                            </parameters>
                        </alarm>
                    </alarms>
                </tvset_process>
            </tvset>
        </tvsets>
        <household_process result="KO" vacation="no">
            <alarms>
                <alarm id="AL_T_FP_AUDIENCE_WITHOUT_PRESENCE" rule_id="T_FP_AUDIENCE_WITHOUT_PRESENCE">
                    <parameters>
                        <parameter name="tvset_id" value="1"/>
                        <parameter name="length" value="80"/>
                        <parameter name="start" value="2015-04-06T07:21:44"/>
                    </parameters>
                </alarm>
                <alarm id="AL_T_FP_AUDIENCE_WITHOUT_PRESENCE" rule_id="T_FP_AUDIENCE_WITHOUT_PRESENCE">
                    <parameters>
                        <parameter name="tvset_id" value="1"/>
                        <parameter name="length" value="792"/>
                        <parameter name="start" value="2015-04-06T07:23:44"/>
                    </parameters>
                </alarm>
                <alarm id="AL_R_FP_AUDIENCE_TOO_HIGH_LIMIT" rule_id="R_FP_AUDIENCE_TOO_HIGH_LIMIT">
                    <parameters>
                        <parameter name="tvset_id" value="1"/>
                        <parameter name="length" value="47176"/>
                        <parameter name="end" value="2015-04-06T20:30:00"/>
                        <parameter name="channel" value="1010128"/>
                    </parameters>
                </alarm>
                <alarm id="AL_R_FP_AT_LEAST_ONE_MEMBER_OK" rule_id="R_FP_AT_LEAST_ONE_MEMBER_OK">
                    <parameters/>
                </alarm>
            </alarms>
        </household_process>
    </household>
</households>

输出应该是这样的

^{pr2}$

同样地,成员id=2具有相同的家庭id

我们非常感谢您的帮助。 提前谢谢你!在


Tags: nameidparametervaluechannelprocessindividualend
1条回答
网友
1楼 · 发布于 2024-10-01 17:36:04

这假设您的XML在一个名为input.xml的文件中。beauthulsoup可用于帮助解析从文件中读取的XML。然后,只需创建一个表,其中包含要提取的所有信息:

from bs4 import BeautifulSoup
import csv

fields = [
    "household id",
    "destinations",
    "member id"]

member_fields = [    
    ["result", "member_process", "result"],
    ["vacation", "member_process", "vacation"],
    ["individual_audience_tvset id", "individual_audience_tvset", "id"],
    ["session", "channel", "session"],
    ["begin", "channel", "begin"],
    ["end", "channel", "end"],
    ["alarm id", "alarm", "id"],
    ["rule_id", "alarm", "rule_id"],
    ["name", "parameter", "name"],
    ["value", "parameter", "value"]
    ]

fieldnames = fields + [field for field, _, _ in member_fields]    

with open('input.xml') as f_input, open('output.csv', 'w', newline='') as f_output:
    csv_output = csv.DictWriter(f_output, fieldnames=fieldnames)
    csv_output.writeheader()

    xml = f_input.read()
    soup = BeautifulSoup(xml, "xml")
    household_id = soup.find('household')['id']

    for member in soup.find_all('member'):
        member_id = member['id']
        row = {'household id' : household_id, 'member id' : member_id}

        for field, x, y in member_fields:
            row[field] = member.find(x)[y]

        csv_output.writerow(row)

这将创建output.csv,其中包含:

^{pr2}$

相关问题 更多 >

    热门问题