在python3中编写文件时，如何修复导致“cp950”错误的商标符号？

import requests from inscriptis import get_text from bs4 import BeautifulSoup page = requests.get(r'http://www3.asiainsurancereview.com//News/View-NewsLetter-Article/id/42528/Type/eDaily/Technology-First-round-of-the-pre-launch-of-the-Ydentity-ICO-starts-today') soup = BeautifulSoup(page.text, 'lxml') html = soup.find(class_='article-wrap') text = get_text(html.text) print(text) articleFile = open('test.txt', 'w') articleFile.write(text) articleFile.close()

TypeError Traceback (most recent call last) <ipython-input-68-3f30355ab29c> in <module>() 12 text=text.encode("utf-8") 13 ---> 14 articleFile.write(text) 15 16 articleFile.close() TypeError: write() argument must be str, not bytes

text = get_text(html.text) from unidecode import unidecode def remove_non_ascii(text): return unidecode(str(text, encoding = "utf-8")) articleFile = open('test.txt', 'w') articleFile.write(text) articleFile.close()

TypeError Traceback (most recent call last) <ipython-input-70-ff7e6a098308> in <module>() 20 21 ---> 22 articleFile.write(remove_non_ascii(text)) 23 24 articleFile.close() <ipython-input-70-ff7e6a098308> in remove_non_ascii(text) 9 from unidecode import unidecode 10 def remove_non_ascii(text): ---> 11 return unidecode(str(text, encoding = "utf-8")) 12 13 articleFile = open('test.txt', 'w') TypeError: decoding str is not supported

UnicodeEncodeError Traceback (most recent call last) <ipython-input-71-f0c817f013af> in <module>() 20 21 ---> 22 articleFile.write(text) 23 24 articleFile.close() UnicodeEncodeError: 'cp950' codec can't encode character '\u2122' in position 51: illegal multibyte sequence

1条回答

网友

1楼 · 发布于 2024-05-05 05:24:38

我发现解决方案是以二进制模式打开要写入的文件，然后对unicode字符进行编码：

articleFile = open('test.txt', 'wb')
text=text.encode("utf-8")
articleFile.write(text)
articleFile.close()

显然，Python无法将编码的unicode文本写入文件，除非正在写入的文件以二进制模式打开。在

相关问题更多 >

编程相关推荐

热门问题

热门文章