<p>游戏进行到很晚,但在这里描述的批量操作(<a href="http://api.mongodb.com/python/current/examples/bulk.html" rel="nofollow noreferrer">http://api.mongodb.com/python/current/examples/bulk.html</a>)中取得了很好的成功。<code>insert_many()</code>方法已经在引擎盖下执行了必要的分块。我的工作流程包括一个大的“批量插入”,然后是许多后续的完整集合更新。使用批量更新过程比循环单次更新快很多倍。但是速度的增加百分比根据输入的大小而变化(10,100,1000,1</p>
<pre><code>def unordered_bulk_write():
bulk_op = collection.initialize_unordered_bulk_op()
for primary_key in primary_key_list:
bulk_op.find({'fubar_key': primary_key}).update({'$set': {'dopeness_factor': 'unlimited'}})
try:
bulk_op.execute()
except Exception as e:
print e, e.details
def single_update_write():
for primary_key in primary_key_list:
collection.update_one({'fubar_key': primary_key}, {'$set':
{'dopeness_factor': 'unlimited'}})
</code></pre>
<p>这些方法运行在一个带有<code>%%timing</code>魔力的ipy笔记本中,我得到了以下统计信息。方法是在给定的随机选择的主键块上,随着块大小的增加而在映射中调用的。</p>
<pre><code>WITH CHUNK_SIZE = 10
UNORDERED BULK WRITE = 1000 loops, best of 3: 871 µs per loop
SINGLE UPDATE ONE = 100 loops, best of 3: 2.47 ms per loop
WITH CHUNK_SIZE = 100
UNORDERED BULK WRITE = 100 loops, best of 3: 4.57 ms per loop
SINGLE UPDATE ONE = 10 loops, best of 3: 26.2 ms per loop
WITH CHUNK_SIZE = 1000
UNORDERED BULK WRITE = 10 loops, best of 3: 39 ms per loop
SINGLE UPDATE ONE = 1 loops, best of 3: 246 ms per loop
WITH CHUNK_SIZE = 10000
UNORDERED BULK WRITE = 1 loops, best of 3: 399 ms per loop
SINGLE UPDATE ONE = 1 loops, best of 3: 2.58 s per loop
WITH CHUNK_SIZE = 100000
UNORDERED BULK WRITE = 1 loops, best of 3: 4.34 s per loop
SINGLE UPDATE ONE = 1 loops, best of 3: 24.8 s per loop
</code></pre>