在内存数据库（如Redis或Memcache）中存储时，压缩JSON的最佳方法是什么？

26 投票

6 回答

44940 浏览

提问于 2025-04-17 19:45

需求：我们需要处理一些Python对象，这些对象有2到3层的嵌套，里面包含基本的数据类型，比如整数、字符串、列表和字典（不包括日期等）。这些对象需要以JSON格式存储在Redis中，并且要有一个对应的键。我们希望找到一些好的方法，把JSON压缩成字符串，以减少内存占用。目标对象的大小不算大，平均大约有1000个小元素，转换成JSON后大约有15000个字符。

例如：

>>> my_dict
{'details': {'1': {'age': 13, 'name': 'dhruv'}, '2': {'age': 15, 'name': 'Matt'}}, 'members': ['1', '2']}
>>> json.dumps(my_dict)
'{"details": {"1": {"age": 13, "name": "dhruv"}, "2": {"age": 15, "name": "Matt"}}, "members": ["1", "2"]}'
### SOME BASIC COMPACTION ###
>>> json.dumps(my_dict, separators=(',',':'))
'{"details":{"1":{"age":13,"name":"dhruv"},"2":{"age":15,"name":"Matt"}},"members":["1","2"]}'

1/ 有没有其他更好的方法可以压缩JSON，以节省Redis中的内存（同时确保后续解码时也轻便）？

2/ msgpack [http://msgpack.org/] 这个方案怎么样？

3/ 我是否也应该考虑像pickle这样的选项？

数据序列化数据存储 redis pickle 嵌套对象内存数据库 json压缩 msgpack

6 个回答

我对几种不同的二进制格式（比如MessagePack、BSON、Ion、Smile CBOR）和压缩算法（像Brotli、Gzip、XZ、Zstandard、bzip2）进行了详细的比较。

在我测试用的JSON数据中，保持数据为JSON格式并使用Brotli压缩是最好的选择。Brotli有不同的压缩级别，如果你打算长期保存数据，使用更高的压缩级别是值得的。如果保存时间不长，使用较低的压缩级别或者Zstandard可能更有效。

Gzip使用起来很简单，但几乎肯定会有其他选择，它们可能更快，或者压缩效果更好，或者两者兼具。

你可以在这里阅读我们调查的详细内容：博客文章

回答于 2025-04-17 由 Python大师

分享举报

根据@Alfe的回答，这里有一个版本，它可以把内容保存在内存中（适合网络输入输出的任务）。我还做了一些修改，以支持Python 3。

import gzip
from io import StringIO, BytesIO

def decompressBytesToString(inputBytes):
  """
  decompress the given byte array (which must be valid 
  compressed gzip data) and return the decoded text (utf-8).
  """
  bio = BytesIO()
  stream = BytesIO(inputBytes)
  decompressor = gzip.GzipFile(fileobj=stream, mode='r')
  while True:  # until EOF
    chunk = decompressor.read(8192)
    if not chunk:
      decompressor.close()
      bio.seek(0)
      return bio.read().decode("utf-8")
    bio.write(chunk)
  return None

def compressStringToBytes(inputString):
  """
  read the given string, encode it in utf-8,
  compress the data and return it as a byte array.
  """
  bio = BytesIO()
  bio.write(inputString.encode("utf-8"))
  bio.seek(0)
  stream = BytesIO()
  compressor = gzip.GzipFile(fileobj=stream, mode='w')
  while True:  # until EOF
    chunk = bio.read(8192)
    if not chunk:  # EOF?
      compressor.close()
      return stream.getvalue()
    compressor.write(chunk)

要测试压缩效果，可以尝试：

inputString="asdf" * 1000
len(inputString)
len(compressStringToBytes(inputString))
decompressBytesToString(compressStringToBytes(inputString))

回答于 2025-04-17 由 Python大师

分享举报

我们只是用 gzip 来压缩数据。

import gzip
import cStringIO

def decompressStringToFile(value, outputFile):
  """
  decompress the given string value (which must be valid compressed gzip
  data) and write the result in the given open file.
  """
  stream = cStringIO.StringIO(value)
  decompressor = gzip.GzipFile(fileobj=stream, mode='r')
  while True:  # until EOF
    chunk = decompressor.read(8192)
    if not chunk:
      decompressor.close()
      outputFile.close()
      return 
    outputFile.write(chunk)

def compressFileToString(inputFile):
  """
  read the given open file, compress the data and return it as string.
  """
  stream = cStringIO.StringIO()
  compressor = gzip.GzipFile(fileobj=stream, mode='w')
  while True:  # until EOF
    chunk = inputFile.read(8192)
    if not chunk:  # EOF?
      compressor.close()
      return stream.getvalue()
    compressor.write(chunk)

在我们的应用场景中，我们把结果存储为文件，想必你能理解。如果只想用内存中的字符串，也可以用 cStringIO.StringIO() 这个对象来代替文件。

回答于 2025-04-17 由 Python大师

分享举报

在内存数据库（如Redis或Memcache）中存储时，压缩JSON的最佳方法是什么？

6 个回答

撰写回答