在iso8859_1和utf8之间转换Python字符串

2024-09-30 20:23:10 发布

男 | 程序猿一只，喜欢编程写python代码。

我尝试用python做与下面java代码相同的事情。在

String decoded = new String("ä¸".getBytes("ISO8859_1"), "UTF-8");
System.out.println(decoded);

输出是一个中文字符串“cip”。在

在Python中，我尝试了encode/decode/bytearray，但总是得到无法读取的字符串。我想我的问题是我不太了解java/python编码机制是如何工作的。我也无法从现有的答案中找到解决办法。在

^{pr2}$

ä¸- --  <type 'str'>
Ã¤Â¸Â- --  <type 'str'>
Ã¤Â¸Â- --  <type 'bytearray'>
Ã¤Â¸Â- --  <type 'str'>
Ã¤Â¸Â- --  <type 'str'>

丹尼尔·罗斯曼的答案非常接近。谢谢您。但说到我的真实情况：

    ch = 'masanori harigae ã\201®ã\203\221ã\203¼ã\202½ã\203\212ã\203«ä¼\232è-°å®¤'
    print ch.decode('utf-8').encode('iso-8859-1')

我得到了

回溯（最近一次呼叫）：文件“”，第1行，输入文件“/apps/Python/lib/python2.7/encodings/utf_8.py”，第16行，在decode中返回codecs.utf_8_解码（输入，错误，真） UnicodeDecodeError:“utf8”编解码器无法解码位置19中的字节0x81：起始字节无效

Java代码：

    String decoded = new String("masanori harigae ã\201®ã\203\221ã\203¼ã\202½ã\203\212ã\203«ä¼\232è-°å®¤".getBytes("ISO8859_1"), "UTF-8");
    System.out.println(decoded);

输出是masanori harigaeのパーソナルょ室

Tags：代码 new string type java system utf decoded

1条回答

网友

1楼 · 发布于 2024-09-30 20:23:10

你这样做是不对的。有一个bytestring被错误地编码为utf-8，您希望它被解释为iso-8859-1：

>>> ch = "ä¸"
>>> print u.decode('utf-8').encode('iso-8859-1')
中