在数据库中缓存大型非一致代码字典？

{ 'Reuters/19960916': { '54826newsML': '<?xml version="1.0" encoding="iso-8859-1" ?>\r\n<newsitem itemid="54826" id="root" date="1996-09-16" xml:lang="en">\r\n<title>USA: RESEARCH ALERT - Crestar Financial cut.</title>\r\n<headline>RESEARCH ALERT - Crestar Financial cut.</headline>\r\n<text>\n-- Salomon Brothers analyst Carole Berger said she cut her rating on Crestar Financial Corp to hold from buy, at the same time lowering her 1997 earnings per share view to $5.40 from $5.85.\n-- Crestar said it would buy Citizens Bancorp in a $774 million stock swap.\n-- Crestar shares were down 2-1/2 at 58-7/8. Citizens Bancorp soared 14-5/8 to 46-7/8.\n</text>\r\n<copyright>(c) Reuters Limited', '55964newsML': '<?xml version="1.0" encoding="iso-8859-1" ?>\r\n<newsitem itemid="55964" id="root" date="1996-09-16" xml:lang="en">\r\n<title>USA: Nebraska cattle sales thin at $114/dressed-feedlot.</title>\r\n' } }

2条回答

网友

1楼 · 编辑于 2024-06-16 14:46:29

pymongo并不要求字符串是unicode，它实际上按原样发送ascii字符串，并将unicode编码为UTF8。当从pymongo检索数据时，总是使用unicode。@@http://api.mongodb.org/python/2.0/tutorial.html#a-note-on-unicode-strings

如果您的输入包含具有高位字节的“国际”字节字符串（如ab\xC3cd），则需要将这些字符串转换为unicode或将其编码为UTF-8。下面是一个处理任意嵌套dict的简单递归转换器：

def unicode_all(s):
    if isinstance(s, dict):
        return dict((unicode(k), unicode_all(v)) for k, v in s.items())
    if isinstance(s, list):
        return [unicode_all(v) for v in s]
    return unicode(s)

网友

2楼 · 编辑于 2024-06-16 14:46:29

如果您有RAM（显然是这样，因为您首先填充了字典）cPickle。或者，如果您想要一些需要较少RAM但速度较慢的设备shelve。在

相关问题更多 >

编程相关推荐

热门问题

热门文章