性能优化（python）：加速pandas DataFram的.append（）

2024-09-30 06:12:07 发布

男 | 程序猿一只，喜欢编程写python代码。

我在mongoDB中有一个非常大的数据集，它被查询并附加到结果数据帧中。在

for tree in db.im_tree_active.find({"date" : { '$gte' : startdate , 
'$lte' : enddate },"depth" : {'$gte' : 1, '$lte' : 4}, no_cursor_timeout = True).batch_size(1500):
    if count % 1000 == 0:
        print(count, tot)
    #keyFill(keylist, tree)  <-- added to compensate for mismatched columns
    #im = im.append(tree)  <-- ran too slowly
    im.loc[count, :] = tree  <-- runs much faster but keyFill() slows down
    count+=1

使用pandas.append()函数创建了一个dataframe的副本，当dataframe变得更大时，这个拷贝花费了太多时间。在

我用一个.loc[]语句替换了append语句，我读了这个语句应该可以加快查询速度，但是我收到了一个不匹配的列错误。这是因为在MongoDB中迭代的一些trees没有其他类型的字段。我通过添加由以下简单代码给出的函数keyFill()来修复此问题：

^{pr2}$

但是，在每次调用.loc[]之前运行此操作会导致查询速度降低近1000%（估计）。在

有没有办法加快整个过程？在通过数据集达到大约50%之前，查询的运行速度要快得多，然后继续降低速度，最后1000棵树的运行时间几乎是前1000棵树的10倍。在

Tags：数据函数 tree dataframe for count 时间语句

0条回答

目前没有回答

性能优化（python）：加速pandas DataFram的.append（）

相关问题更多 >

编程相关推荐

热门问题

热门文章

性能优化（python）：加速pandas DataFram的.append（）

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >