elementtree:获取xml文档中特定标记的内容

2024-05-20 12:28:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试提取XML文件中特定标记的内容。

XML示例:

<facts>
        <fact>
            <name>crash</name>
            <full_name>Crash</full_name>
            <variables>
                <variable>
                    <name>id</name>
                    <proper_name>Crash Instance</proper_name>
                    <type>INT</type>
                    <interpretation>key</interpretation>
                </variable>
                <variable>
                    <name>accident_key</name>
                    <proper_name>Case Identifier</proper_name>
                    <interpretation>string</interpretation>
                    <type>CHAR(9)</type>
                </variable>
                <variable>
                    <name>accident_year</name>
                    <proper_name>Crash Year</proper_name>
                    <interpretation>dim</interpretation>
                    <type>INT</type>
                </variable>
            </variables>
        </fact>
    <fact>
        <name>vehicle</name>
        <full_name>Vehicle</full_name>
        <variables>
            <variable>
                <name>id</name>
                <proper_name>Vehicle Instance</proper_name>
                <type>INT</type>
            </variable>
            <variable>
                <name>crash_id</name>
                    <proper_name>Crash Instance</proper_name>
                <type>INT</type>
            </variable>
        </variables>
    </fact>
</facts>

我想从节点中提取标记的所有内容,但仅限于崩溃事实。

这是我目前的密码。

def header(filename, fact):    
    lst = []
    tree = ET.parse(filename) #read in the XML
    for fact in tree.iter(tag = 'fact'):
        factname = fact.find('name').text
        if factname == fact: #choose the fact to pull from
            for var in fact.iter(tag = 'variable'):
                name = var.find('name').text
                lst.append(name)
     return lst #return a list of all the <name> tags from the Crash fact

newlst = header('schema.xml','crash')

我的输出newlst应该是崩溃事实中所有标记的列表。但它总是空空如也。

奇怪的是,如果我硬编码所有内容(并删除函数),它会返回正确的输出:

lst = []
tree = ET.parse('schema.xml')
for fact in tree.iter(tag = 'fact'):
    factname = fact.find('name').text
    if factname == 'crash': 
        for var in fact.iter(tag = 'variable'):
            name = var.find('name').text
            lst.append(name)
 print(lst)


 Output: ['id',
 'accident_key',
 'accident_year']

Tags: nameinidtypecrashvariablesvariablefull