python中的Hashlib md5为某些unicode字符返回不正确的摘要？

import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; public class Test { public static void main(String[] args) throws Exception { MessageDigest m = MessageDigest.getInstance("MD5"); m.update("\u00db".getBytes()); System.out.println(bytesToHex(m.digest())); m.update("Û".getBytes()); System.out.println(bytesToHex(m.digest())); } final protected static char[] hexArray = "0123456789abcdef".toCharArray(); public static String bytesToHex(byte[] bytes) { char[] hexChars = new char[bytes.length * 2]; for ( int j = 0; j < bytes.length; j++ ) { int v = bytes[j] & 0xFF; hexChars[j * 2] = hexArray[v >>> 4]; hexChars[j * 2 + 1] = hexArray[v & 0x0F]; } return new String(hexChars); } }

2条回答

网友

1楼 · 编辑于 2024-10-06 07:07:32

你的假设是不正确的。Python源代码的开头是：

# -*- coding: utf-8 -*-

在这种情况下，Û不是等价于\xdb；而是两个字节：

^{pr2}$

Python在这里是完全一致的：

>>> import hashlib
>>> hashlib.md5('\xc3\x9b').hexdigest()
'31ecfb09f120720a55d96a2034f5d00b'
>>> hashlib.md5('\xdb').hexdigest()
'98fd00d788afe2a5fa5e4f8e1666638b'

在Java中，您开始使用Unicode代码点，将其转换为UTF-8字节：

"\u00db".getBytes()

Python等价物将使用unicode字符串文本和\uhhhh或{}转义序列：

>>> u'\u00db'.encode('utf8')
'\xc3\x9b'
>>> u'\xdb'.encode('utf8')
'\xc3\x9b'

注意u前缀以生成unicode字符串。\xdb没有u前缀是一个字节串，而不是Unicode码位，只有当你将它解码为拉丁语1时，才会得到相同的Unicode字符串：

>>> '\xdb'.decode('latin1')
u'\xdb'
>>> '\xdb'.decode('latin1').encode('utf8')
'\xc3\x9b'

您可能想学习Python和Unicode；请参见：

Python Unicode HOWTO
Pragmatic Unicode作者：Ned Batchelder

为了完整起见：

The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)作者：乔尔斯波尔斯基

网友

2楼 · 编辑于 2024-10-06 07:07:32

I expected the two digests to be equivalent, given that Û ought to be equivalent to \xdb.

Û是UTF-8中的C3 9B，您似乎正在使用它（这是您声明的编码）。DB将是ISO-8859-1。在

>>> import hashlib
>>> hashlib.md5(b'\xc3\x9b').hexdigest()
'31ecfb09f120720a55d96a2034f5d00b'

塔达！在

相关问题更多 >

编程相关推荐

热门问题

热门文章