使用python2.7通过gunicorn以utf8字符串形式发送bz2压缩数据

def app(environ, start_response): data = "Hello, World!" * 10 compressed_data = bz2.compress(data) start_response("200 OK", [("Content-Type", "text/plain"), ('charset', 'utf-8'), ("Content-Length", str(len(compressed_data))), ('Access-Control-Allow-Headers', '*'), ('Access-Control-Allow-Origin', '*'), # ('Content-Transfer-Encoding', 'BASE64'), ]) return iter([compressed_data])

import bz2 print(bz2.compress("Hello, World!" * 10)) >> 'BZh91AY&SYy\xabm\x99\x00\x00\x13\x97\x80`\x04\x00@\x00\x80\x06\x04\x90\x00 \x00\xa5P\xd0\xda\x10\x03\x0e\xd3\xd4\xdai4\x9bO\x93\x13\x13\xc2b~\x9c\x17rE8P\x90y\xabm\x99'

2条回答

网友

1楼 · 编辑于 2024-09-28 23:41:09

不能将bzip2压缩数据作为utf-8发送。它是二进制数据，不是文本。在

如果您的http客户端接受bzip2内容编码（^{} is not standard），那么您可以发送使用bzip2压缩的utf-8编码文本：

#!/usr/bin/env python
import bz2

def app(environ, start_response):
    status = '200 OK'
    headers = [('Content-type', 'text/plain; charset=utf-8')]
    data = (u'Hello \N{SNOWMAN}\n' * 10).encode('utf-8')

    if 'bzip2' in environ.get('HTTP_ACCEPT_ENCODING', ''): # use bzip2 only if requested
        data = bz2.compress(data)
        headers.append(('Content-Encoding', 'bzip2'))

    headers.append(('Content-Length', str(len(data))))
    start_response(status, headers)
    return data

示例

未压缩响应：

^{pr2}$

如果客户端指定接受bzip2，则bzip2压缩响应：

$ http -v 127.0.0.1:8000 Accept-Encoding:bzip2 
GET / HTTP/1.1
Accept: */*
Accept-Encoding: bzip2
Connection: keep-alive
Host: 127.0.0.1:8000
User-Agent: HTTPie/0.9.2



HTTP/1.1 200 OK
Connection: close
Content-Encoding: bzip2
Content-Length: 65
Content-type: text/plain; charset=utf-8
Date: Sun, 17 May 2015 18:48:23 GMT
Server: gunicorn/19.3.0



+                    -+
| NOTE: binary data not shown in terminal |
+                    -+

下面是使用requests库的相应http客户端：

#!/usr/bin/env python
from __future__ import print_function
import bz2
import requests # $ pip install requests

r = requests.get('http://localhost:8000', headers={'Accept-Encoding': 'gzip, deflate, bzip2'})
content = r.content
print(len(content))
if r.headers['Content-Encoding'].endswith('bzip2'): # requests doesn't understand bzip2
    content = bz2.decompress(content)
print(len(content))
text = content.decode(r.encoding)
print(len(text))
print(text, end='')

输出

65
100
80
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃

否则（没有非标准的接受编码），您应该以application/octet-stream作为@icedtrees suggested发送数据：

#!/usr/bin/env python
import bz2

def app(environ, start_response):
    status = '200 OK'
    headers = [('Content-type', 'application/octet-stream')]
    data = bz2.compress((u'Hello \N{SNOWMAN}\n' * 10).encode('utf-8'))

    headers.append(('Content-Length', str(len(data))))
    start_response(status, headers)
    return data

示例

$ http 127.0.0.1:8000 
HTTP/1.1 200 OK
Connection: close
Content-Length: 65
Content-type: application/octet-stream
Date: Sun, 17 May 2015 18:53:55 GMT
Server: gunicorn/19.3.0



+                    -+
| NOTE: binary data not shown in terminal |
+                    -+

bzcat接受bzip2内容：

$ http 127.0.0.1:8000 | bzcat
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃
Hello ☃

由于终端使用utf-8编码，所以数据显示正确。在

网友

2楼 · 编辑于 2024-09-28 23:41:09

问题是字符串以unicode的形式出现。您不应该尝试将bz2压缩数据解释为文本。在

有关如何将数据解释为原始数据而不是文本，请参见request docs：

res.content  # not res.text

此外，首先不应将数据作为text/plain发送。BZ2压缩数据不是文本，应该作为application/octet-stream（即字节流）发送。在

快速破解将文本重新解释为字节流（因为默认的ascii编解码器无法处理0-127范围之外的字节，所以我们使用ISO-8859-1对数据进行编码。在

^{pr2}$

但理想情况下，您应该修复您的数据类型。在

示例

输出

示例

相关问题更多 >

编程相关推荐

热门问题

热门文章