如何在python中使用gzip向压缩字符串添加头?

2024-09-23 22:32:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用python压缩字符串,就像一个特定的C#代码,但得到的结果不同。似乎我必须向压缩结果添加一个头,但我不知道如何在python中向压缩字符串添加头。这是C#行,我不知道在python中会是什么:

memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

这是整个可运行的C代码

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

namespace Rextester
{
    /// <summary>Handles compressing and decompressing API requests and responses.</summary>
    public class Compression
    {
        #region Member Variables
        /// <summary>The compressed message header length.</summary>
        private const int CompressedMessageHeaderLength = 4;
        #endregion

        #region Methods
        /// <summary>Compresses the XML string.</summary>
        /// <param name="documentToCompress">The XML string to compress.</param>
        public static string CompressData(string data)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                byte[] plainBytes = Encoding.UTF8.GetBytes(data);

                using (GZipStream zipStream = new GZipStream(memoryStream, CompressionMode.Compress, leaveOpen: true))
                {
                    zipStream.Write(plainBytes, 0, plainBytes.Length);
                }

                memoryStream.Position = 0;

                byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];

                Buffer.BlockCopy(
                    BitConverter.GetBytes(plainBytes.Length),
                    0,
                    compressedBytes,
                    0,
                    CompressedMessageHeaderLength
                );

                // Add the header, which is the length of the compressed message.
                memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

                string compressedXml = Convert.ToBase64String(compressedBytes);

                return compressedXml;
            }
        }
        
 
        #endregion
    }

    public class Program
    {
        public static void Main(string[] args)
        {
            //Your code goes here
            string data = "Hello World!";
            Console.WriteLine(  Compression.CompressData(data) );
            // result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA

        }
    }
}

这是我写的Python代码:

data = 'Hello World!'

import gzip
import base64
print(base64.b64encode(gzip.compress(data.encode('utf-8'))))

# I expect DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA 
# but I get H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=

Tags: the代码datastringpublicsummarysystemlength
3条回答

正如其他人所提到的,在c#版本中添加该标题是一个不同的事实

同样,请注意,gzip过程可以通过多种方式完成。例如,在C#中,可以指定CompressionLevelOptimalFastestNoCompression。见:https://docs.microsoft.com/en-us/dotnet/api/system.io.compression.compressionlevel?view=net-5.0

我对Python还不太熟悉,无法说明默认情况下它将如何处理gzip压缩(也许C#中的Fastest提供了比Python更具攻击性的算法)

这是您的C#代码,标题值设置为“0”,并使用3CompressionLevels进行输出。请注意,它输出的字符串值“非常接近”Python中的值

您还应该问,值的不同是否真的很重要。只要你能编解码就够了吗

using System;
using System.IO;
using System.IO.Compression;
using System.Text;

public class Program
{
    public static void Main()
    {
        string data = "Hello World!";
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.Fastest) );
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.NoCompression) );
        Console.WriteLine(  Compression.CompressData(data, CompressionLevel.Optimal) );
        // result would be DAAAAB+LCAAAAAAABADzSM3JyVcIzy/KSVEEAKMcKRwMAAAA
        // but I get       H4sIACwuuWAC//NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=
    }
}

public class Compression
    {
        #region Member Variables
        /// <summary>The compressed message header length.</summary>
        private const int CompressedMessageHeaderLength = 0; // changed to zero
        #endregion

        #region Methods
        /// <summary>Compresses the XML string.</summary>
        /// <param name="documentToCompress">The XML string to compress.</param>
        public static string CompressData(string data, CompressionLevel compressionLevel)
        {
            using (MemoryStream memoryStream = new MemoryStream())
            {
                byte[] plainBytes = Encoding.UTF8.GetBytes(data);

                using (GZipStream zipStream = new GZipStream(memoryStream, compressionLevel, leaveOpen: true))
                {
                    zipStream.Write(plainBytes, 0, plainBytes.Length);
                }

                memoryStream.Position = 0;

                byte[] compressedBytes = new byte[memoryStream.Length + CompressedMessageHeaderLength];

                Buffer.BlockCopy(
                    BitConverter.GetBytes(plainBytes.Length),
                    0,
                    compressedBytes,
                    0,
                    CompressedMessageHeaderLength
                );

                // Add the header, which is the length of the compressed message.
                memoryStream.Read(compressedBytes, CompressedMessageHeaderLength, (int)memoryStream.Length);

                string compressedXml = Convert.ToBase64String(compressedBytes);

                return compressedXml;
            }
        }
        
 
        #endregion
    }

输出:

H4sIAAAAAAAEA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA= H4sIAAAAAAAEAwEMAPP/SGVsbG8gV29ybGQhoxwpHAwAAAA= H4sIAAAAAAAAA/NIzcnJVwjPL8pJUQQAoxwpHAwAAAA=

及在:https://dotnetfiddle.net/TI8gwM

您可以使用to_bytes转换编码字符串的长度:

enc = data.encode('utf-8')
zipped = gzip.compress(enc)
print(base64.b64encode((len(enc)).to_bytes(4, sys.byteorder) + zipped)) # sys.byteorder can be set to concrete fixed value

另外,似乎gzip.compress(enc)产生的结果与C#对应的结果略有不同(因此总体结果也会有所不同),但这不应该是一个问题,因此解压缩应该正确处理所有问题

首先,我要说的是C代码不适合跨平台使用。长度头的字节顺序取决于基础架构,因为BitConverter.GetBytes以架构的任何顺序返回字节

但是,对于C#,我们可能指的是windows,也可能指的是Intel,所以很可能是Little Endian

所以,您需要做的是将原始数据的长度以小的Endian顺序预先添加到压缩数据中。正好是4个字节

bdata = data.encode('utf-8')
compressed = gzip.compress(bdata)
header = len(bdata).to_bytes(4,'little')

然后,您需要连接并转换为base64:

print(base64.b64encode(header + compressed))

相关问题 更多 >