为什么倾倒和装载需要更长的时间pickle.HIGHEST_协议?

2024-10-01 09:20:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我用pickle测试了三种不同的协议:0,1,2。在

在我的测试中,我转储并加载了大约270000对(intint)对的dict和一个大约560000 intdict。在

以下是我的测试代码(您可以安全地跳过我用来从数据库中获取数据的两个fetch函数):

protocol = 0 # Tested 0, 1, and 2
print 'Protocol:', protocol
t0 = time.time()
sku2spu_dict = fetch_sku2spu_dict()
pid_set = fetch_valid_pids()
t1 = time.time()
print 'Time in sql:', t1 - t0
pickle.dump(sku2spu_dict, open('sku.pcike_dict', 'w'), protocol)
pickle.dump(pid_set, open('pid.picke_set', 'w'), protocol)
t2 = time.time()
print 'Time in dump:', t2 - t1
sku2spu_dict = pickle.load(open('sku.pcike_dict', 'r'))
pid_set = pickle.load(open('pid.picke_set', 'r'))
t3 = time.time()
print 'Time in load:', t3 - t2

下面是每个人花费的时间:

^{pr2}$

令我大吃一惊的是,协议2比0和1差得多。在

但是,协议2的转储文件大小最小,大约是协议0和协议1的一半。在

文件中说:

Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes.

对于new-style classes的定义,它说:

Any class which inherits from object. This includes all built-in types like list and dict

所以我希望协议2在转储和加载对象时更快。在

有人知道为什么吗?在

更新:

pickle替换为cPickle后问题已解决。在

现在,load和{}使用协议2需要5和3秒,而协议0和1需要10秒以上。在


Tags: in协议timeloadfetchopenpidprotocol
1条回答
网友
1楼 · 发布于 2024-10-01 09:20:22

当文档谈到“新样式类”时,它(可能)指的是用户定义的新样式类。如果你用它们做一个简单的基准测试,你会发现协议2在转储它们时比协议0快两倍:

>>> import cPickle
>>> import timeit
>>> class MyObject(object):
...     def __init__(self, val):
...             self.val = val
...     def method(self):
...             print self.val
... 
>>> timeit.timeit('cPickle.dumps(MyObject(100), 0)', 'from __main__ import cPickle, MyObject')
17.654622077941895
>>> timeit.timeit('cPickle.dumps(MyObject(100), 1)', 'from __main__ import cPickle, MyObject')
14.536609172821045
>>> timeit.timeit('cPickle.dumps(MyObject(100), 2)', 'from __main__ import cPickle, MyObject')
8.885567903518677

加载也会使速度提高2倍:

^{pr2}$

在您的特殊情况下,可能正好相反,但是如果没有定义fetch_sku2spu_dict等的代码,我们什么都不能说。我唯一可以假设的是返回值是dict,但在这种情况下,协议2大约快6倍:

>>> mydict = dict(zip(range(100), range(100)))
>>> timeit.timeit('cPickle.dumps(mydict, 0)', 'from __main__ import cPickle, mydict')
46.335021018981934
>>> timeit.timeit('cPickle.dumps(mydict, 1)', 'from __main__ import cPickle, mydict')
7.913743019104004
>>> timeit.timeit('cPickle.dumps(mydict, 2)', 'from __main__ import cPickle, mydict')
7.798863172531128

加载速度大约快2.5倍:

>>> dumped = cPickle.dumps(mydict, 0)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
32.81050395965576
>>> dumped = cPickle.dumps(mydict, 1)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
13.997781038284302
>>> dumped = cPickle.dumps(mydict, 2)
>>> timeit.timeit('cPickle.loads(dumped)', 'from __main__ import cPickle, dumped')
14.006750106811523

另一方面,在使用python版本的模块时,我发现:

>>> mydict = dict(zip(range(100), range(100)))
>>> timeit.timeit('pickle.dumps(mydict,0)', 'from __main__ import pickle, mydict', number=10000)
2.9552500247955322
>>> timeit.timeit('pickle.dumps(mydict,1)', 'from __main__ import pickle, mydict', number=10000)
3.831756830215454
>>> timeit.timeit('pickle.dumps(mydict,2)', 'from __main__ import pickle, mydict', number=10000)
3.842888116836548

因此,使用协议1和协议2转储内置对象似乎比在python版本中使用协议0慢。但当加载对象时,协议0再次是三者中最慢的:

>>> dumped = pickle.dumps(mydict, 0)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
2.988792896270752
>>> dumped = pickle.dumps(mydict, 1)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
1.2793281078338623
>>> dumped = pickle.dumps(mydict, 2)
>>> timeit.timeit('pickle.loads(dumped)', 'from __main__ import pickle, dumped', number=10000)
1.5425071716308594

从上面的小基准测试可以看出,pickle所花费的时间取决于许多因素,从您要酸洗的对象类型到您使用的pickle模块的哪个版本。如果没有进一步的信息,我们将无法解释为什么在你的情况下协议2是如此缓慢。在

相关问题 更多 >