将刮取的数据转换为字典

2024-10-01 15:34:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个XML文件,在运行漂亮的soup findAll(“命名查询”)并将其打印出来后,我得到如下结果:

<named-query name="sdfsdfsdf">
        <query>
            ---Query here...--
        </query>
</named-query>

<named-query name="xkjlias">
        <query>
          ---Query here...--
        </query>
</named-query>
   .
   .
   .

有没有办法将其转换为字典、json或类似csv的格式:

name=“sdfsdfsdf” 查询=

name=“xkjlias” 查询=

提前谢谢


Tags: 文件namejson字典herexmlquery命名
2条回答

代码:

import json

from bs4 import BeautifulSoup


text = """
<named-query name="sdfsdfsdf">
    <query>
         -Query here... 
    </query>
</named-query>

<named-query name="xkjlias">
    <query>
         -Query here2... 
    </query>
</named-query>"""


soup = BeautifulSoup(text, 'html.parser')
queries = {nq.attrs['name']: nq.text.strip() for nq in soup.find_all('named-query')}
queries_json = json.dumps(queries)

print(queries)  # dict
print(queries_json)  # json

输出:

{'sdfsdfsdf': ' -Query here... ', 'xkjlias': ' -Query here2... '}
{"sdfsdfsdf": " -Query here... ", "xkjlias": " -Query here2... "}

试试这个:

# initialize a dictionary
data = {}

# for each tag 'named-query 
for named_query in soup.findAll('named-query'):
        # get the value of name attribute and store it in a dict
        data['name'] = named_query.attrs['name']
        # traverse its children
        for child in named_query.children:
                # check for '\n' and empty strings
                if len(child.string.strip()) > 0:
                        data['query'] = child.string.strip()
print (data)

>>> {'name': 'sdfsdfsdf', 'query': ' -Query here... '}

相关问题 更多 >

    热门问题