如何使用python根据特定条件添加xml标记

2024-06-14 13:08:02 发布

您现在位置:Python中文网/ 问答频道 /正文

示例XML文件

<ArticleSet>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. abc@gmail.com</Affiliation>
        <Keywords>-</Keywords>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>-</Affiliation>
        <Keywords>-</Keywords>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. ghi@yahoo.co.in</Affiliation>
        <Keywords>-</Keywords>
    </Article>
</ArticleSet>

示例代码

^{pr2}$

输出所需更新的XML文件

<?xml version="1.0"?>
<ArticleSet>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. abc@gmail.com</Affiliation>
        <Keywords>-</Keywords>
        <Email>abc@gmail.com</Email>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>-</Affiliation>
        <Keywords>-</Keywords>
        <Email>-</Email>
    </Article>
    <Article>
        <ForeName>a</ForeName>
        <LastName>b</LastName>
        <Affiliation>harvard university of science. ghi@yahoo.co.in</Affiliation>
        <Keywords>-</Keywords>
        <Email>ghi@yahoo.co.in</Email>
    </Article>
</ArticleSet>

我想从<Affiliation>标记中提取电子邮件地址,并创建一个名为<Email>的新标记并将提取的电子邮件存储到该标记中。如果<Affiliation>等于-,则将<Email>-</Email>存储到该文章中。在

错误

Traceback (most recent call last): File "C:/Users/Ghost Rider/Documents/Python/addingTagsToXML.py", line 11, in etree.write(article,c) AttributeError: module 'xml.etree.ElementTree' has no attribute 'write'


Tags: ofincomemailarticlegmailscienceabc
3条回答

你可以试试这个:

import re
import xml
tree = xml.etree.ElementTree.parse('filename.xml')
e = tree.getroot()

for article in e.findall('Article'):
    child = xml.etree.ElementTree.Element("Email")
    if article[2].text != '-':
        email = re.search(r'[\w\.-]+@[\w\.-]+', article[2].text).group()
        child.text = email
    else:
        child.text = ' - '
    article.insert(4,child)
tree.write("filename.xml")

可以使用lxml实例xml图书馆。这个代码运行良好

import re
from lxml import etree as et
# Open original file
tree = et.parse('t.xml')
for article in tree.iter("Affiliation"):
    if(article.text != "-"):
        email = re.search(r'[\w\.-]+@[\w\.-]+', article.text)
        child = et.SubElement(article.getparent(), 'Email')
        child.text = email.group(0)
    else:
        child = et.SubElement(article.getparent(), 'Email')
        child.text = ' - '

# Write back to file
tree.write('t.xml')

如果要使用write,应按如下方式更正etree导入:

from xml.etree.ElementTree import ElementTree

而且您不应该使用etree作为ElementTree的别名,因为它会使etreepython内置模块过度疲劳!在

此外,我认为您误解了write函数的含义,因为它只能将结果树写入文件。如果要修改elemtree,应该在元素上使用appendextend等。在

相关问题 更多 >