Python 3.4在写入文件时删除或忽略表情符号

2024-10-03 02:46:29 发布

您现在位置:Python中文网/ 问答频道 /正文


UnicodeEncodeError: 'charmap' codec can't encode characters in position 177-181: character maps to <undefined>







<?xml version='1.0' encoding='UTF-8' standalone='yes' ?>
<?xml-stylesheet type="text/xsl" href="sms.xsl"?>
<smses count="1">
  <sms protocol="0" address="+00000000000" date="1346772606199" type="1" subject="null" body="Lorem ipsum dolor sit amet, consectetur adipisicing elit," toa="null" sc_toa="null" service_center="+00000000000" read="1" status="-1" locked="0" date_sent="1346772343000" readable_date="Sep 4, 2012 10:30:06 AM" contact_name="John Doe" />

Tags: 文件内容编码datetype错误unicodexml


  1. 选择可以处理表情符号代码点的编码。您已使用默认编解码器(取决于您的系统)打开文件进行写入,或者选择了不支持代码点的显式编码


    with open(filename, 'w', encoding='utf8') as outfile:
  2. 设置错误处理模式,用替换字符、转义序列替换编解码器无法处理的代码点,或完全忽略它们。请参阅^{} functionerrors参数:

    errors is an optional string that specifies how encoding and decoding errors are to be handled–this cannot be used in binary mode. A variety of standard error handlers are available, though any error handling name that has been registered with codecs.register_error() is also valid. The standard names are:

    • 'strict' to raise a ValueError exception if there is an encoding error. The default value of None has the same effect.
    • 'ignore' ignores errors. Note that ignoring encoding errors can lead to data loss.
    • 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data.
    • 'surrogateescape' will represent any incorrect bytes as code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These private code points will then be turned back into the same bytes when the surrogateescape error handler is used when writing data. This is useful for processing files in an unknown encoding.
    • 'xmlcharrefreplace' is only supported when writing to a file. Characters not supported by the encoding are replaced with the appropriate XML character reference &#nnn;.
    • 'backslashreplace' (also only supported when writing) replaces unsupported characters with Python’s backslashed escape sequences.


    with open(filename, 'w', errors='ignore') as outfile:


>>> a_ok = 'The U+1F44C OK HAND SIGN codepoint: \U0001F44C'
>>> print(a_ok)
The U+1F44C OK HAND SIGN codepoint: 👌
>>> a_ok.encode('utf8')
b'The U+1F44C OK HAND SIGN codepoint: \xf0\x9f\x91\x8c'
>>> a_ok.encode('cp1251', errors='ignore')
b'The U+1F44C OK HAND SIGN codepoint: '
>>> a_ok.encode('cp1251', errors='replace')
b'The U+1F44C OK HAND SIGN codepoint: ?'
>>> a_ok.encode('cp1251', errors='xmlcharrefreplace')
b'The U+1F44C OK HAND SIGN codepoint: &#128076;'
>>> a_ok.encode('cp1251', errors='backslashreplace')
b'The U+1F44C OK HAND SIGN codepoint: \\U0001f44c'




不要执行file.write( data ),而是尝试file.write( data.encode("utf-8") )

相关问题 更多 >