Python 3.4在写入文件时删除或忽略表情符号问题的回答

Python 3.4在写入文件时删除或忽略表情符号

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

您有两个选择： <ol> <li>选择可以处理表情符号代码点的编码。您已使用默认编解码器（取决于您的系统）打开文件进行写入，或者选择了不支持代码点的显式编码 UTF编码可以很好地处理代码点；我在这里选择UTF-8： <pre><code>with open(filename, 'w', encoding='utf8') as outfile: outfile.write(yourdata) </code></pre></li> <li>设置错误处理模式，用替换字符、转义序列替换编解码器无法处理的代码点，或完全忽略它们。请参阅<a href="https://docs.python.org/3/library/functions.html#open" rel="nofollow">^{<cd1>} function</a><code>errors</code>参数： <blockquote> errors is an optional string that specifies how encoding and decoding errors are to be handled–this cannot be used in binary mode. A variety of standard error handlers are available, though any error handling name that has been registered with <code>codecs.register_error()</code> is also valid. The standard names are: <ul> <li><code>'strict'</code> to raise a <code>ValueError</code> exception if there is an encoding error. The default value of <code>None</code> has the same effect.</li> <li><code>'ignore'</code> ignores errors. Note that ignoring encoding errors can lead to data loss.</li> <li><code>'replace'</code> causes a replacement marker (such as <code>'?'</code>) to be inserted where there is malformed data.</li> <li><code>'surrogateescape'</code> will represent any incorrect bytes as code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These private code points will then be turned back into the same bytes when the <code>surrogateescape</code> error handler is used when writing data. This is useful for processing files in an unknown encoding.</li> <li><code>'xmlcharrefreplace'</code> is only supported when writing to a file. Characters not supported by the encoding are replaced with the appropriate XML character reference <code>&#nnn;</code>.</li> <li><code>'backslashreplace'</code> (also only supported when writing) replaces unsupported characters with Python’s backslashed escape sequences.</li> </ul> </blockquote> 因此，使用<code>errors='ignore'</code>打开文件将不会写入表情符号代码点，而不会引发错误： <pre><code>with open(filename, 'w', errors='ignore') as outfile: outfile.write(yourdata) </code></pre></li> </ol> 演示： <pre><code>>>> a_ok = 'The U+1F44C OK HAND SIGN codepoint: \U0001F44C' >>> print(a_ok) The U+1F44C OK HAND SIGN codepoint: 👌 >>> a_ok.encode('utf8') b'The U+1F44C OK HAND SIGN codepoint: \xf0\x9f\x91\x8c' >>> a_ok.encode('cp1251', errors='ignore') b'The U+1F44C OK HAND SIGN codepoint: ' >>> a_ok.encode('cp1251', errors='replace') b'The U+1F44C OK HAND SIGN codepoint: ?' >>> a_ok.encode('cp1251', errors='xmlcharrefreplace') b'The U+1F44C OK HAND SIGN codepoint: &#128076;' >>> a_ok.encode('cp1251', errors='backslashreplace') b'The U+1F44C OK HAND SIGN codepoint: \\U0001f44c' </code></pre> 请注意<code>'surrogateescape'</code>选项的空间有限，仅在解码未知编码的文件时才真正有用；它在任何情况下都无法处理表情符号

Python 3.4在写入文件时删除或忽略表情符号

1 个回答

相关Python问题