擅长:python、mysql、java
<p>gzip模块的问题不是它不能解压缩部分文件,而是只有在它试图验证解压缩内容的校验和时,才会在最后出现错误。(原始校验和存储在压缩文件的末尾,因此验证永远不会使用部分文件。)</p>
<p>关键是诱使gzip跳过验证。<a href="https://stackoverflow.com/a/16504428/1405710">answer by caesar0301</a>是通过修改gzip源代码来实现这一点的,但是不必这么做,简单的猴子补丁就可以了。我编写此上下文管理器是为了在解压缩部分文件时临时替换<code>gzip.GzipFile._read_eof</code>:</p>
<pre><code>import contextlib
@contextlib.contextmanager
def patch_gzip_for_partial():
"""
Context manager that replaces gzip.GzipFile._read_eof with a no-op.
This is useful when decompressing partial files, something that won't
work if GzipFile does it's checksum comparison.
"""
_read_eof = gzip.GzipFile._read_eof
gzip.GzipFile._read_eof = lambda *args, **kwargs: None
yield
gzip.GzipFile._read_eof = _read_eof
</code></pre>
<p>示例用法:</p>
<pre><code>from cStringIO import StringIO
with patch_gzip_for_partial():
decompressed = gzip.GzipFile(StringIO(compressed)).read()
</code></pre>