基于不同标记的属性提取XML标记数据

3条回答

网友

1楼 · 编辑于 2024-10-02 22:30:36

下面是我要做的详细步骤：

import xml.etree.ElementTree as ET

# 1. Parse your xml file
tree = ET.parse('your.xml')

# 2. Get the root
root = tree.getroot()

# 3. Set the tag and attribute you are looking for
ns = 'urn:tva:metadata:2004'
matchTag = 'NicamWarningCS'

# 4. retrieve all Genres
genres = root.find('{%s}ProgramDescription' % ns) \
    .find('{%s}ProgramInformationTable' % ns) \
    .find('{%s}ProgramInformation' % ns) \
    .find('{%s}BasicDescription' % ns) \
    .findall('{%s}Genre' % ns)

# 5. filter them in order to get just the Names of the ones that match your matchTag : 'NicamWarningCS'
filteredGenreNames = [genre.find('{%s}Name' % ns) for genre in genres if matchTag in genre.get('href')]

# 6. extract the text of the tags
data = [t.text for t in filteredGenreNames]

print(data)
# ['Grof taalgebruik', 'Geweld']

网友

2楼 · 编辑于 2024-10-02 22:30:36

我没能很快得到埃里克的答案，但它向我暗示了一个不同的解决办法，设法解决我的问题的方向。你知道吗

通过创建一个包含所有类型的dict，我可以过滤掉所有Nicam警告，并将它们添加到一个列表中，然后我可以使用该列表填充我的SQL语句：

genreitemdict = dict()
for genreitem in program.iter("{urn:tva:metadata:2004}Genre"):
    for child in genreitem:
        genreitemdict[genreitem.attrib['href']] = child.text
        NicamWarningCS = [v for k, v in genreitemdict.items() if 'NicamWarningCS' in k]
print NicamWarningCS

这也许不是最好的解决方案，但目前来看，这样就可以了。你知道吗

网友

3楼 · 编辑于 2024-10-02 22:30:36

只需获取所有Genre元素，并筛选您感兴趣的href属性的元素：

ns = 'urn:tva:metadata:2004'
all_genres = fromstring(xml) \
    .find('{%s}ProgramDescription' % ns) \
    .find('{%s}ProgramInformationTable' % ns) \
    .find('{%s}ProgramInformation' % ns) \
    .find('{%s}BasicDescription' % ns) \
    .findall('{%s}Genre' % ns)
some_genres = [g for g in all_genres if 'NicamWarningCS' in g.get('href')]

相关问题更多 >

编程相关推荐

热门问题

热门文章

基于不同标记的属性提取XML标记数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >