在python代码中编码UTF8字符时出现UTF8错误。它们显示为UTF8

outtweets = [[str(tweet.text.encode("utf-8"))] for tweet in correct_date_tweet] outtweets = [[stuff.replace("b\'", "")] for sublist in outtweets for stuff in sublist] outtweets = [[stuff.replace('b\"', "")] for sublist in outtweets for stuff in sublist]

更新

根据一个答案，我尝试将其中一行改为outtweets = [[tweet.text] for tweet in correct_date_tweet]

但这并没有起作用，因为它产生了

--------------------------------------------------------------------------- UnicodeEncodeError Traceback (most recent call last) <ipython-input-12-a864b5efe8af> in <module>() ----> 1 get_all_tweets("BobBlumenfield","instance file") <ipython-input-9-d0b9b37c7261> in get_all_tweets(screen_name, mode) 104 with open(os.path.join(save_location,'%s.instance' % screen_name), mode ='w') as f: 105 writer = csv.writer(f) --> 106 writer.writerows(outtweets) 107 else: 108 with open(os.path.join(save_location,'%s.csv' % screen_name), 'w',encoding='utf-8') as f: C:\Users\Stan Shunpike\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final) 17 class IncrementalEncoder(codecs.IncrementalEncoder): 18 def encode(self, input, final=False): ---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0] 20 21 class IncrementalDecoder(codecs.IncrementalDecoder): UnicodeEncodeError: 'charmap' codec can't encode characters in position 64-65: character maps to <undefined>

1条回答

网友

1楼 · 发布于 2024-06-28 20:10:14

删除以下行：

outtweets = [[str(tweet.text.encode("utf-8"))] for tweet in correct_date_tweet]

原因如下：

你在编码一个字节串。因此b。在
您正在使用未定义编码的str。在这种模式下，您将获得数组的表示形式，其中包括类型，这也是b和UTF-8转义。在
不需要在代码中间进行编码。仅在写入文件或网络时编码（打印时不编码）。如果使用open()的内置编码器，则很少需要自己调用.encode()。在

在文本模式下使用open()时，请始终指定编码，因为每个平台的编码不同。在

从代码中删除.encode()的所有其他用法。在

现在您可以删除试图更正错误的其他行。在

我的问题

代码

更新

相关问题更多 >

编程相关推荐

热门问题

热门文章