我正在从一个大数据文件中提取json数据,以将内容转换为csv格式,但出现了一个错误:
Traceback (most recent call last):
File "python/gamesTXTtoCSV.py", line 99, in <module>
writer.writerow(foo)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 15: ordinal not in range(128)
经过一番挖掘,我发现字符串“\u2013”出现在json数据文件中。你知道吗
示例(请参见值字段):
"states":[
{
"display":null,
"name":"choiceText",
"type":"string",
"value":"Show me around \u2013 as long as your friends don't chase me away again!"
},
我已经尝试了各种方法来替换脚本中的字符串,以消除有问题的字符串。你知道吗
比如(其中i[value]是冒犯性字段:
i['value'].replace("\\u2013", "--")
或者
i['value'].replace("\\", "") #this one is the last resort
甚至
i['value'].encode("utf8")
但没有用-我不断得到错误。知道怎么回事吗?你知道吗
以下是编写csv的代码部分,以防需要其他上下文:
################## filling out the csv ################
openfile= open(inFile)
f = open(outFile, 'wt')
writer = csv.writer(f)
writer.writerow(all_cols)
for row in openfile.readlines():
line = json.loads(row)
stateCSVrow= []
states=line['states']
contexts=line['context']
contextCSVrow=[]
k = 0
for state in state_names:
for i in states:
if i['name']==state:
i['value'].replace("\u2019", "'") ####THE SECTION GIVING ISSUE
i['value'].replace("\u2013", "--")
stateCSVrow.append(i['value'])
if len(stateCSVrow)==k:
stateCSVrow.append('NA')
k +=1
c = 0
for context in context_names:
for i in contexts:
if i['name']==context:
contextCSVrow.append(i['value'])
if len(contextCSVrow)==c:
contextCSVrow.append('NA')
c +=1
first=[]
first.extend([
line['key'] ,
line['timestamp'],
line['actor']['actorType'],
line['user']['username'],
line['version'],
line['action']['name'],
line['action']['actionType']
])
foo = first + stateCSVrow + contextCSVrow
writer.writerow(foo)
您试图替换unicode转义序列的repr,不要这样做。你知道吗
(Arch Linux上带3.5的IPython)
它在Python2中的作用是一样的:
相关问题 更多 >
编程相关推荐