替换双字符unicode

from __future__ import unicode_literals import io line="The current population of Côte d'Ivoire is 26,051,291" for c in line: if ord(c) > 127: print(c, c.encode('utf-8').hex()) line1 = line.replace(u"\uC3B4", "ô") line2 = line.replace(c, u"\u00F4") line3 = line.replace(c, "ô") #with io.open("Test.html", "w", encoding="utf-8") as f_out: with io.open("Test.html", "w") as f_out: f_out.write(line+"\n") f_out.write(line1+"\n") f_out.write(line2+"\n") f_out.write(line3+"\n")

00000000h: 54 68 65 20 63 75 72 72 65 6E 74 20 70 6F 70 75 ; The current popu 00000010h: 6C 61 74 69 6F 6E 20 6F 66 20 43 C3 B4 74 65 20 ; lation of CÃ´te 00000020h: 64 27 49 76 6F 69 72 65 20 69 73 20 32 36 2C 30 ; d'Ivoire is 26,0 00000030h: 35 31 2C 32 39 31 0D 0A 54 68 65 20 63 75 72 72 ; 51,291..The curr 00000040h: 65 6E 74 20 70 6F 70 75 6C 61 74 69 6F 6E 20 6F ; ent population o 00000050h: 66 20 43 C3 B4 74 65 20 64 27 49 76 6F 69 72 65 ; f CÃ´te d'Ivoire 00000060h: 20 69 73 20 32 36 2C 30 35 31 2C 32 39 31 0D 0A ; is 26,051,291.. 00000070h: 54 68 65 20 63 75 72 72 65 6E 74 20 70 6F 70 75 ; The current popu 00000080h: 6C 61 74 69 6F 6E 20 6F 66 20 43 C3 B4 74 65 20 ; lation of CÃ´te 00000090h: 64 27 49 76 6F 69 72 65 20 69 73 20 32 36 2C 30 ; d'Ivoire is 26,0 000000a0h: 35 31 2C 32 39 31 0D 0A 54 68 65 20 63 75 72 72 ; 51,291..The curr 000000b0h: 65 6E 74 20 70 6F 70 75 6C 61 74 69 6F 6E 20 6F ; ent population o 000000c0h: 66 20 43 C3 B4 74 65 20 64 27 49 76 6F 69 72 65 ; f CÃ´te d'Ivoire 000000d0h: 20 69 73 20 32 36 2C 30 35 31 2C 32 39 31 0D 0A ; is 26,051,291..

1条回答

网友

1楼 · 发布于 2024-09-28 19:31:37

右-您正在混合苹果和橙子，即Unicode代码点（以U+XXXX为符号）和字节（以\xXX为符号的pythonical）

>>> l = "ô"  # our text to be ebcoded
>>> "U+%04x" % ord(l)
'U+00f4'  # the code point (ordinal encoded in hex)
>>> l.encode("utf-8")
b'\xc3\xb4'  # the UTF-8 encoded bytes

如果你真的想写一个UTF-8文件，那么你基本上完成了！您正在编写UTF-8，其中ô恰好是一个编码为两个字节的字符

相关问题更多 >

编程相关推荐

热门问题

热门文章