使用cPickle序列化大型字典会导致内存

3条回答

网友

1楼 · 编辑于 2024-10-01 02:32:12

你可以试试另一个泡菜库。也可能有一些cPickle设置可以更改。在

其他选择：把你的字典分成更小的部分，然后把每一部分都打包。当你把所有东西都装进去的时候再把它们放在一起。在

抱歉，这是模糊的，我只是在胡思乱想。我想这可能还是有帮助的，因为没有其他人回答。在

网友

2楼 · 编辑于 2024-10-01 02:32:12

你很可能在这项工作中使用了错误的工具。如果您想持久化大量的索引数据，我强烈建议使用SQLite磁盘数据库（或者，当然，只是一个普通数据库）和一个ORM类似的SQLObject或{a2}。在

这些程序将处理一些普通的事情，如兼容性、优化格式、不同时将所有数据保存在内存中以使内存耗尽。。。在

添加：因为我一直在做一件几乎完全相同的事情，但主要是因为我是一个很好的人，这里有一个演示，它似乎能满足您的需要（它将在当前目录中创建一个SQLite文件，如果已经存在同名文件，则将其删除，因此请先将其放在空的位置）：

import sqlobject
from sqlobject import SQLObject, UnicodeCol, ForeignKey, IntCol, SQLMultipleJoin
import os

DB_NAME = "mydb"
ENCODING = "utf8"

class Document(SQLObject):
    dbName = UnicodeCol(dbEncoding=ENCODING)

class Location(SQLObject):
    """ Location of each individual occurrence of a word within a document.
    """
    dbWord = UnicodeCol(dbEncoding=ENCODING)
    dbDocument = ForeignKey('Document')
    dbLocation = IntCol()

TEST_DATA = {
    'one' : {
        'doc1' : [1,2,10],
        'doc3' : [6],
    },

    'two' : {
        'doc1' : [2, 13],
        'doc2' : [5,6,7],
    },

    'three' : {
        'doc3' : [1],
    },
}        

if __name__ == "__main__":
    db_filename = os.path.abspath(DB_NAME)
    if os.path.exists(db_filename):
        os.unlink(db_filename)
    connection = sqlobject.connectionForURI("sqlite:%s" % (db_filename))
    sqlobject.sqlhub.processConnection = connection

    # Create the tables
    Document.createTable()
    Location.createTable()

    # Import the dict data:
    for word, locs in TEST_DATA.items():
        for doc, indices in locs.items():
            sql_doc = Document(dbName=doc)
            for index in indices:
                Location(dbWord=word, dbDocument=sql_doc, dbLocation=index)

    # Let's check out the data... where can we find 'two'?
    locs_for_two = Location.selectBy(dbWord = 'two')

    # Or...
    # locs_for_two = Location.select(Location.q.dbWord == 'two')

    print "Word 'two' found at..."
    for loc in locs_for_two:
        print "Found: %s, p%s" % (loc.dbDocument.dbName, loc.dbLocation)

    # What documents have 'one' in them?
    docs_with_one = Location.selectBy(dbWord = 'one').throughTo.dbDocument

    print
    print "Word 'one' found in documents..."
    for doc in docs_with_one:
        print "Found: %s" % doc.dbName

这当然不是唯一的方法（或者一定是最好的方法）。Document表还是Word表应该与Location表分开，这取决于您的数据和典型用法。在您的例子中，“Word”表可能是一个单独的表，其中添加了一些索引和唯一性设置。在

网友

3楼 · 编辑于 2024-10-01 02:32:12

cPickle需要使用大量额外的内存，因为它进行循环检测。如果确定数据没有循环，可以尝试使用封送处理模块

相关问题更多 >

编程相关推荐

热门问题

热门文章