python使用多线程读取mongo的

2024-06-28 14:30:03 发布

您现在位置:Python中文网/ 问答频道 /正文

Just link the title, when I use multi thread to read the data from mongo, not fast even equal to only one process, is there something wrong i use?

我的多线程代码如下:

def multi_thread_flush(logger):
    n_loops = 20
    locks = []
    for i in range(0, n_loops):
        lock = thread.allocate_lock()
        lock.acquire()
        locks.append(lock)
    try:
        for i in range(0, n_loops):
            thread.start_new_thread(get_node_entry_id,
                                (logger, 0 + 400000 * i, 400000, locks[i],))
        for i in range(0, n_loops):
            while locks[i].locked(): pass
        logger.info("[all down] all down")
    except Exception as e:
        logger.error("exception: %s" % e)

def get_node_entry_id(logger, num1, num2, lock):
    cursor = client.mongo_collection.find({},no_cursor_timeout=True).skip(num1).batch_size(30)
    count = 0
    for item in cursor:
        if count > num2:
            break
        logger.info("%s" % item["_id"])
        count = count + 1
    lock.release()

我的一个流程代码如下:

^{pr2}$

我尝试将批处理大小从300更改为3000,但改进较小。在


Tags: thetoinidlockforusecount
1条回答
网友
1楼 · 发布于 2024-06-28 14:30:03

可能是因为蒙哥斯基普()因为您的批量是400000+。因为每次执行查询时,服务器都必须从集合的开始一直走到指定的偏移量。见this doc。在

As the offset increases, mongo.skip() will be slower.

它还建议使用索引执行切片,例如:

db.col.find({_id: { $gt: offset}}).limit(batch_size)

相关问题 更多 >