Python:将文本加载为Python obj

2024-10-01 11:30:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我要加载这样的文本:https://sites.google.com/site/iminside1/paste
我希望从中创建一个python字典,但是任何对象都可以。我尝试了picklejson和{},但没有成功。你能帮我做这个吗?
谢谢!
结果:

a = open("the_file", "r").read()

json.loads(a)
ValueError: Expecting property name: line 1 column 1 (char 1)

pickle.loads(a)
KeyError: '{'

eval(a)
File "<string>", line 19
from: {code: 'DME', airport: "Домодедово", city: 'Москва', country: 'Россия', terminal: ''},
    ^
SyntaxError: invalid syntax

Tags: 对象https文本comjson字典googleline
3条回答

几乎直接从pyparsing示例页中提取:

# read text from web page
import urllib
page = urllib.urlopen("https://sites.google.com/site/iminside1/paste")
html = page.read()
page.close()

start = html.index("<pre>")+len("<pre>")+3 #skip over 3-byte header
end = html.index("</pre>")
text = html[start:end]
print text

# parse dict-like syntax    
from pyparsing import (Suppress, Regex, quotedString, Word, alphas, 
alphanums, oneOf, Forward, Optional, dictOf, delimitedList, Group, removeQuotes)

LBRACK,RBRACK,LBRACE,RBRACE,COLON,COMMA = map(Suppress,"[]{}:,")
integer = Regex(r"[+-]?\d+").setParseAction(lambda t:int(t[0]))
real = Regex(r"[+-]?\d+\.\d*").setParseAction(lambda t:float(t[0]))
string_ = Word(alphas,alphanums+"_") | quotedString.setParseAction(removeQuotes)
bool_ = oneOf("true false").setParseAction(lambda t: t[0]=="true")
item = Forward()

key = string_
dict_ = LBRACE - Optional(dictOf(key+COLON, item+Optional(COMMA))) + RBRACE
list_ = LBRACK - Optional(delimitedList(item)) + RBRACK
item << (real | integer | string_ | bool_ | Group(list_ | dict_ ))

result = item.parseString(text,parseAll=True)[0]
print result.data[0].dump()
print result.data[0].segments[0].dump(indent="  ")
print result.data[0].segments[0].flights[0].dump(indent="  -  ")
print result.data[0].segments[0].flights[0].flightLegs[0].dump(indent="  -  -  ")
for seg in result.data[6].segments:
    for flt in seg.flights:
        fltleg = flt.flightLegs[0]
        print "%(airline)s %(airlineCode)s %(flightNo)s" % fltleg,
        print "%s -> %s" % (fltleg["from"].code, fltleg["to"].code)

印刷品:

^{pr2}$

编辑:修复了分组和扩展的输出转储,以显示如何通过索引(在列表中)或作为属性(在dict中)访问结果的各个关键字段。在

如果你真的要让公牛队上膛。。。这个数据(见我的评论),你最好用正则表达式加上缺少的引号。像r"([a-zA-Z_][a-zA-Z_0-9]*)\s*\:"来找到引用r"\'\1\'\:"作为替换的东西(我必须从头开始测试)。在

编辑:在python3.1中对向后引用进行了一些麻烦之后,我终于让它使用这些:

>>> pattern = r"([a-zA-Z_][a-zA-Z_0-9]*)\s*\:"
>>> test = '{"foo": {bar: 1}}'
>>> repl = lambda match: '"{}":'.format(match.group(1))
>>> eval(re.sub(pattern, repl, test))
{'foo': {'bar': 1}}

到目前为止,在delnan的帮助下,我可以用eval将其加载到dict中:

pattern = r"\b(?P<word>\w+):"
x = re.sub(pattern, '"\g<word>":',open("the_file", "r").read())
y = x.replace("true", '"true"')
d = eval(y)

仍在寻找更高效、更简单的解决方案。。出于某些原因,我不喜欢用eval。在

相关问题 更多 >