使用python和lxml对页面进行爬网（<type'异常。UnicodeEncodeError'>，UnicodeEncodeError（'ascii'，

(<type 'exceptions.UnicodeEncodeError'>, UnicodeEncodeError('ascii', u'Approximate Dimensions: 4\xbd" x 4" x 7" (assembled)', 25, 26, 'ordinal not in range(128)'), <traceback object at 0x7f9198ac48c0>)

2条回答

网友

1楼 · 编辑于 2024-09-26 18:06:35

尝试：

el.text_content().encode('utf-8')

它是unicode，您希望将它（作为文本）存储到utf-8。在

网友

2楼 · 编辑于 2024-09-26 18:06:35

页眉所说的页面用于编码的内容可能与实际情况不同。如果页面的实际编码不是utf-8，那么做正确的业务就有点麻烦了。在

首先，您应该查看从el.text_content()返回的文本

x = el.text_content() print x

{3{3}还没有解码的意思。在

如果x是unicode（以“u”开头），则应将unicode转换为str，并用适当的编码（如cp1252或其他符号）对其进行解码

chars = ''.join([chr(ord(x)) for x in el.text_content()]) /// It will change your dumb unicode to str result = chars.decode({try with different encoding until it doesn't throw an error}) /// now you decode str with proper format

相关问题更多 >

编程相关推荐

热门问题

热门文章