将nonascii字符编码为UTF16

f = open ("input.txt","r") data = f.read() x=list(data) i=0 for element in x: if ord(element)>127: y=hex(ord(x[i])) y=y[2:] y='\u00'+y x[i]=y i=i+1 data=''.join(x) t= open("output.txt","w") t.write(data) f.close() t.close()

3条回答

网友

1楼 · 编辑于 2024-10-02 02:41:42

使用内置的^{} method of strings：

# A string with a single, non-ascii character.
s = '\u00e9'

# UTF-16 encoding beginning with a byte-order-mark to identify its endianness.
s.encode('utf-16')      # b'\xff\xfe\xe9\x00'

# UTF-16 big-endian, no byte-order-mark.
s.encode('utf-16-be')   # b'\x00\xe9'

# UTF-16 little-endian, no byte-order-mark.
s.encode('utf-16-le')   # b'\xe9\x00'

网友

2楼 · 编辑于 2024-10-02 02:41:42

以二进制模式打开文件

with open(filename,"rb") as f:
     print f.read()

如果不起作用，试试内置的编解码器

^{pr2}$

网友

3楼 · 编辑于 2024-10-02 02:41:42

@TokenMacGuy已将此答案发布到the old question which you've deleted。你仍然可以看到这个问题，因为这个问题已经被删除了

所以你想从unicode转换成ascii表示法，其中非ascii码位被“转义”？如果是的话，那么：

>>> sample = u'some stuff: éŘ'
>>> ''.join(c if 0 < ord(c) <= 127 else '\\u{:04x}'.format(ord(c)) for c in sample)
u'some stuff: \\u00e9\\u0158'
>>> print ''.join(c if 0 < ord(c) <= 127 else '\\u{:04x}'.format(ord(c)) for c in sample)
some stuff: \u00e9\u0158

顺便说一下，这个算法是而不是utf-16；请不要那么叫它；它是ASCII！UTF-16如下所示：

^{pr2}$

_{注意：您没有指定此示例是在python2.7中，而不是python3；如果您需要，请将其添加到您的问题中}

我不确定这是否对你有帮助。或者，@TokenMacGuy自己会编辑这个答案，使它更有帮助。在

相关问题更多 >

编程相关推荐

热门问题

热门文章