pythongzip和javagzipoutputstream给出了不同的结果

import gzip import hashlib gzip_bytes = gzip.compress(bytes('test', 'utf-8')) gzip_hex = gzip_bytes.hex().upper() md5 = hashlib.md5(gzip_bytes).hexdigest().upper() >>>gzip_hex '1F8B0800678B186002FF2B492D2E01000C7E7FD804000000' >>>md5 'C4C763E9A0143D36F52306CF4CCC84B8'

import java.io.ByteArrayOutputStream; import java.util.zip.GZIPOutputStream; import java.io.IOException; import java.security.MessageDigest; import java.security.NoSuchAlgorithmException; public class HelloWorld{ private static final char[] HEX_ARRAY = "0123456789ABCDEF".toCharArray(); public static String bytesToHex(byte[] bytes) { char[] hexChars = new char[bytes.length * 2]; for (int j = 0; j < bytes.length; j++) { int v = bytes[j] & 0xFF; hexChars[j * 2] = HEX_ARRAY[v >>> 4]; hexChars[j * 2 + 1] = HEX_ARRAY[v & 0x0F]; } return new String(hexChars); } public static String md5(byte[] bytes) { try { MessageDigest md = MessageDigest.getInstance("MD5"); byte[] thedigest = md.digest(bytes); return bytesToHex(thedigest); } catch (NoSuchAlgorithmException e){ new RuntimeException("MD5 Failed", e); } return new String(); } public static void main(String []args){ String string = "test"; final byte[] bytes = string.getBytes(); try { final ByteArrayOutputStream bos = new ByteArrayOutputStream(); final GZIPOutputStream gout = new GZIPOutputStream(bos); gout.write(bytes); gout.close(); final byte[] encoded = bos.toByteArray(); System.out.println("gzip: " + bytesToHex(encoded)); System.out.println("md5: " + md5(encoded)); } catch(IOException e) { new RuntimeException("Failed", e); } } }

1条回答

网友

1楼 · 发布于 2024-10-06 09:33:48

您的要求“Python中gzip字符串的散列，并且需要它与Java的相同”通常无法满足。您需要改变您的需求，以不同的方式实现您的需求。我建议只要求解压缩的数据具有相同的散列。事实上，两个gzip字符串中已经存在一个32位的解压数据哈希（CRC-32），它们是相同的（0xd87f7e0c）。如果您想要更长的散列，那么可以附加一个。最后四个字节是未压缩的长度，模为2³²，因此您也可以对它们进行比较。只需比较两个字符串的最后八个字节，并检查它们是否相同

问题中两个gzip字符串之间的差异说明了这个问题。一个在标题中有时间戳，另一个没有（设置为零）。即使他们都有时间戳，他们也很可能会有所不同。它们在头文件中还有一些其他不同的字节，比如原始操作系统

此外，示例中的压缩数据非常短，因此在本例中恰好相同。但是，对于任何合理数量的数据，两个gzip程序生成的压缩数据都是不同的，除非它们恰好使用完全相同的deflate代码、相同版本的代码以及相同的内存大小和压缩级别设置。如果您不能控制所有这些，那么在给定相同的未压缩数据的情况下，您将永远无法确保从它们中输出相同的压缩数据

简而言之，不要浪费时间试图获得相同的压缩字符串

相关问题更多 >

编程相关推荐

热门问题

热门文章