python反斜杠替换失败

2024-09-29 02:23:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在从一个大数据文件中提取json数据,以将内容转换为csv格式,但出现了一个错误:

Traceback (most recent call last):
  File "python/gamesTXTtoCSV.py", line 99, in <module>
    writer.writerow(foo)
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 15: ordinal not in range(128)

经过一番挖掘,我发现字符串“\u2013”出现在json数据文件中。你知道吗

示例(请参见值字段):

"states":[
      {
         "display":null,
         "name":"choiceText",
         "type":"string",
         "value":"Show me around \u2013 as long as your friends don't chase me away again!"
      },

我已经尝试了各种方法来替换脚本中的字符串,以消除有问题的字符串。你知道吗

比如(其中i[value]是冒犯性字段:

 i['value'].replace("\\u2013", "--")

或者

i['value'].replace("\\", "") #this one is the last resort

甚至

i['value'].encode("utf8")

但没有用-我不断得到错误。知道怎么回事吗?你知道吗

以下是编写csv的代码部分,以防需要其他上下文:

################## filling out the csv ################
openfile= open(inFile)
f = open(outFile, 'wt')
writer = csv.writer(f)
writer.writerow(all_cols)

for row in openfile.readlines():
    line = json.loads(row)
    stateCSVrow= []
    states=line['states']
    contexts=line['context']
    contextCSVrow=[]
    k = 0
    for state in state_names:
        for i in states:
            if i['name']==state:
                i['value'].replace("\u2019", "'") ####THE SECTION GIVING ISSUE
                i['value'].replace("\u2013", "--")
                stateCSVrow.append(i['value'])
        if len(stateCSVrow)==k:
            stateCSVrow.append('NA')
        k +=1
    c = 0
    for context in context_names:
        for i in contexts:
            if i['name']==context:
                contextCSVrow.append(i['value'])
        if len(contextCSVrow)==c:
            contextCSVrow.append('NA')
        c +=1
    first=[]
    first.extend([
        line['key'] ,
        line['timestamp'],
        line['actor']['actorType'],
        line['user']['username'],
        line['version'],
        line['action']['name'],
        line['action']['actionType']
          ])

    foo = first + stateCSVrow + contextCSVrow
    writer.writerow(foo)

Tags: csvnameinforifvaluecontextline
1条回答
网友
1楼 · 发布于 2024-09-29 02:23:43

您试图替换unicode转义序列的repr,不要这样做。你知道吗

In [3]: x = 'fnord \u2034'

In [4]: x
Out[4]: 'fnord ‴'

In [5]: x.replace('\u2034', 'hi')
Out[5]: 'fnord hi'

(Arch Linux上带3.5的IPython)

它在Python2中的作用是一样的:

⚘ python2
Python 2.7.11 (default, Dec  6 2015, 15:43:46)
[GCC 5.2.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> x = "Show me around \u2013 as long as your friends don't chase me away again!"
>>> x
"Show me around \\u2013 as long as your friends don't chase me away again!"
>>> x.replace('\u2013', ' ')
"Show me around   as long as your friends don't chase me away again!"

相关问题 更多 >