从XML中提取数据

2024-09-29 21:55:16 发布

您现在位置:Python中文网/ 问答频道 /正文

我看了好几个例子,都没能根据自己的需要修改一个。。我试图从一个文件中提取的制造商和模型标签,但无论以前回答了什么问题,我发现我不能让它为我工作

编辑-可能没什么不同。不同的是我对python的理解水平。试图编辑堆栈上已经存在的不同答案中提供的脚本,我无法成功地让它工作

<camera>
   <maker>Fujifilm</maker>
    <model>GFX 50S</model>
    <mount>Fujifilm G</mount>
    <cropfactor>0.79</cropfactor>
</camera>

Tags: 文件答案模型编辑model堆栈水平标签
2条回答

看看python docs

import xml.etree.ElementTree as ET

root = ET.fromstring(xml_string)
maker = root.findtext('maker')
model = root.findtext('model')

尝试bs4

from bs4 import BeautifulSoup

page = '''
        <camera>
            <maker>Fujifilm</maker>
            <model>GFX 50S</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        '''

soup = BeautifulSoup(page, 'lxml')
make = soup.find('maker')
model = soup.find('model')
print(f'Make: {make.text}\nModel: {model.text}')

对于多个条目,只需使用find\u all()循环遍历它们

from bs4 import BeautifulSoup

page = '''
        <camera>
            <maker>Fujifilm</maker>
            <model>GFX 50S</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        <camera>
            <maker>thing1</maker>
            <model>thing2</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        <camera>
            <maker>thing3</maker>
            <model>thing4</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        <camera>
            <maker>thing5</maker>
            <model>thing6</model>
            <mount>Fujifilm G</mount>
            <cropfactor>0.79</cropfactor>
        </camera>
        '''

soup = BeautifulSoup(page, 'lxml')
make = soup.find_all('maker')
model = soup.find_all('model')
for x, y in zip(make, model):
    print(f'Make: {x.text}\nModel: {y.text}')

通过文件获取数据:

from bs4 import BeautifulSoup

with open('path/to/your/file') as file:
    page = file.read()
    soup = BeautifulSoup(page, 'lxml')
    make = soup.find_all('maker')
    model = soup.find_all('model')
    for x, y in zip(make, model):
        print(f'Make: {x.text}\nModel: {y.text}')

不导入任何模块:

with open('/PATH/TO/YOUR/FILE') as file:

    for line in file:
        for each in line.split():
            if "maker" in each:
                each = each.replace("<maker>", "")
                print(each.replace("</maker>", ""))

这仅适用于'maker'标记,将这些标记拆分为单独的定义并遍历它们可能是有益的

相关问题 更多 >

    热门问题