python3将变量导入字典

2024-05-03 12:15:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试将下面的print命令输出到一个字典中(没有成功),以便随后可以将其导出到CSV。你知道吗

如何将parseddata(下面的打印输出)放入字典?你知道吗

示例输入文件:

<html>
<body>
<p>{ success:true ,results:3,rows:[{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"N‌​on-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cu‌​mulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cum‌​ulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}</p>
</body>
</html>

我的代码:

import requests
import re
from bs4 import BeautifulSoup
url = requests.get("http://. . .")
soup = BeautifulSoup(url.text, "lxml")
parseddata = soup.string.split(':[', 1)[1].lstrip(']')
print(parseddata)

print(parseddata)的输出是:

{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}

Tags: import字典unprintnonindisinconsolidated
2条回答

这看起来像一个键值映射,ISIN是一个键,"INE134E01011"是一个值。但它不是JSON,因为键没有引号,也不是YAML,因为纯标量键(即没有引号的字符串必须是followed by colon + space:)。你知道吗

如果将输出字符串分成¹部分:

test_str = (
    '{ISIN:"INE134E01011",Ind:"-",'
    'Audited:"Un-Audited",'
    'Cumulative:"Non-cumulative",'
    'Consolidated:"Non-Consolidated",'
    'FilingDate:"14-Aug-2015 15:39",'
    'SeqNumber:"1001577"},'
    '{ISIN:"INE134E01011",'  # new mapping starts
    'Ind:"-",'
    'Audited:"Un-Audited",'
    'Cumulative:"Non-cumulative",'
    'Consolidated:"Non-Consolidated",'
    'FilingDate:"30-May-2015 14:37",'
    'SeqNumber:"129901"},'
    '{ISIN:"INE134E01011",'    # new mapping starts
    'Ind:"-",'
    'Audited:"Un-Audited",'
    'Cumulative:"Non-cumulative",'
    'Consolidated:"Non-Consolidated",'
    'FilingDate:"17-Feb-2015 14:57",'
    'SeqNumber:"126171"}]}'
)

它等于您的输入:

test_org = '{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"14-Aug-2015 15:39",SeqNumber:"1001577"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"30-May-2015 14:37",SeqNumber:"129901"},{ISIN:"INE134E01011",Ind:"-",Audited:"Un-Audited",Cumulative:"Non-cumulative",Consolidated:"Non-Consolidated",FilingDate:"17-Feb-2015 14:57",SeqNumber:"126171"}]}'
assert test_str == test_org

这种拆分清楚地表明,实际上有3个映射,后面有一个]}]表示有一个列表,这与用逗号分隔3个映射是一致的。匹配的[丢失了,因为在':['上拆分之后,您将lstrip()它拿走了。你知道吗

您可以轻松地操作字符串,以便YAML可以解析它,但结果是一个列表:

import ruamel.yaml
test_str = '[' + test_str.replace(':"', ': "').rstrip('}')

data = ruamel.yaml.load(test_str)
print(type(data))

印刷品:

<class 'list'>

由于这个列表所包含的字典有共同的键,你不能把它们组合起来而不丢失信息。你知道吗

您可以将此列表映射到某个键(在split中有一个冒号,并且输出后面有一个}是XML中的指示),也可以获取具有唯一值的键(SeqNumber),并将该值提升到替换列表的dict中的键:

ddata = {}
for elem in data:
    k = elem.pop('SeqNumber')
    ddata[k] = elem

但是如果你的最终目标是一个CSV文件,我看不出有什么理由从一个列表变成一个dict。如果从YAML解析器获取输出,则可以执行以下操作:

import csv
with open('output.csv', 'w', newline='') as fp:
    csvwriter = csv.writer(fp)
    csvwriter.writerow(data[0].keys())  # header of common dict keys
    for elem in data:
        csvwriter.writerow(elem.values())  # values

获取包含以下内容的CSV文件:

ISIN,Ind,Consolidated,Cumulative,Audited,FilingDate
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,14-Aug-2015 15:39
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,30-May-2015 14:37
INE134E01011,-,Non-Consolidated,Non-cumulative,Un-Audited,17-Feb-2015 14:57

¹我没有用\对换行符进行转义,而是使用括号将多行定义变成一个字符串,这样可以更容易地在行上添加注释
²不应重新添加“[”,当然首先不应将其剥离

除了结尾的括号外,这是有效的JSON这是有效的YAML(我在最初的回答中犯了一个错误;JavaScript对象可以在不引用属性的情况下声明,但是JSON可移植格式不允许这样做;YAML允许这样做)。你知道吗

按照说明here使用PyYAML解析数据。手册split-ing和lstrip正在伤害你,使这比它需要的更困难。只需获取text,然后用yaml进行解析(这是必须单独安装的第三方模块):

import requests
import yaml
from bs4 import BeautifulSoup

url = requests.get("http://. . .")
soup = BeautifulSoup(url.text, "lxml")
# Use safe_load over load to avoid opening security holes; YAML can do
# a lot of unsafe things if the input isn't trusted, but handling JS
# object literals can be done safely with safe_load
response_object = yaml.safe_load(soup.string.strip())
data_rows = response_object['rows']

for row in data_rows:
    ... do stuff with each returned row ...

您可以在PyYAML tutorial上阅读更多内容。你知道吗

相关问题 更多 >