如何处理urllib2的缩减响应？

opener = urllib2.build_opener() response = opener.open(req) data = response.read() if response.headers.get('content-encoding', '') == 'gzip': data = StringIO.StringIO(data) gzipper = gzip.GzipFile(fileobj=data) html = gzipper.read()

3条回答

网友

1楼 · 编辑于 2024-05-17 03:19:54

为了回答上述评论，HTTP规范（http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html#sec14.3）说：

If no Accept-Encoding field is present in a request, the server MAY assume that the client will accept any content coding. In this case, if "identity" is one of the available content-codings, then the server SHOULD use the "identity" content-coding, unless it has additional information that a different content-coding is meaningful to the client.

我认为这意味着它应该使用身份。我从来没有见过一个服务器没有

网友

2楼 · 编辑于 2024-05-17 03:19:54

你可以试试

if response.headers.get('content-encoding', '') == 'deflate':
    html = zlib.decompress(response.read())

如果失败了，这里有另一种方法，我在requests source code中找到它

if response.headers.get('content-encoding', '') == 'deflate':
    html = zlib.decompressobj(-zlib.MAX_WBITS).decompress(response.read())

网友

3楼 · 编辑于 2024-05-17 03:19:54

有一个更好的方法概述如下：

http://rationalpie.wordpress.com/2010/06/02/python-streaming-gzip-decompression/

作者解释了如何逐块解压，而不是在内存中同时解压。当涉及较大的文件时，这是首选方法。

也找到了这个有用的测试站点：

http://carsten.codimi.de/gzip.yaws/

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何处理urllib2的缩减响应？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >