在Python中从任何给定的stringtype中删除非ascii字符

>>> teststring = 'aõ' >>> type(teststring) <type 'str'> >>> teststring 'a\xf5' >>> print teststring aõ >>> teststring.decode("ascii", "ignore") u'a' >>> teststring.decode("ascii", "ignore").encode("ascii") 'a'

>>> teststringUni = u'aõ' >>> type(teststringUni) <type 'unicode'> >>> print teststringUni aõ >>> teststringUni.decode("ascii" , "ignore") Traceback (most recent call last): File "<pyshell#79>", line 1, in <module> teststringUni.decode("ascii" , "ignore") UnicodeEncodeError: 'ascii' codec can't encode character u'\xf5' in position 1: ordinal not in range(128) >>> teststringUni.decode("utf-8" , "ignore") Traceback (most recent call last): File "<pyshell#81>", line 1, in <module> teststringUni.decode("utf-8" , "ignore") File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode return codecs.utf_8_decode(input, errors, True) UnicodeEncodeError: 'ascii' codec can't encode character u'\xf5' in position 1: ordinal not in range(128) >>> teststringUni.encode("ascii" , "ignore") 'a'

2条回答

网友

1楼 · 编辑于 2024-05-02 05:32:51

很简单：.encode将Unicode对象转换为字符串，而.decode将字符串转换为Unicode。

网友

2楼 · 编辑于 2024-05-02 05:32:51

Why did the decode("ascii") give out a unicode string?

因为这就是的decode意义：它将字节字符串（如ASCII字符串）解码为unicode。

在第二个例子中，您试图“解码”一个已经是unicode的字符串，但没有效果。不过，要将其打印到终端，Python必须将其编码为默认编码，即ASCII，但由于没有显式地执行该步骤，因此没有指定'ignore'参数，因此会产生无法对非ASCII字符进行编码的错误。

所有这些的诀窍是记住decode接受一个经过编码的bytestring并将其转换为Unicode，而encode则相反。如果您理解Unicode不是一种编码，则可能会更容易。

相关问题更多 >

编程相关推荐

热门问题

热门文章