Python For循环在增加迭代次数后变慢

import timeit import pandas as pd from tqdm import tqdm def some_generator(): for i in range(1_000_000): yield { 'colA': 'valA', 'colB': 'valA', 'colC': 'valA', 'colD': 'valA', 'colE': 'valA', 'colF': 'valA', 'colG': 'valA', 'colH': 'valA', 'colI': 'valA', 'colJ': 'valA' } def main(): batch_size = 10_000 generator = some_generator() output = pd.DataFrame() batch_round = 1 while True: for _ in tqdm(range(batch_size), desc=f"Batch {batch_round}"): try: row = next(generator) row.pop('colA') output = pd.concat([output, pd.DataFrame(row, index=[0])], ignore_index=True) except StopIteration: break if output.shape[0] != batch_size * batch_round: break else: batch_round += 1 print(output)

Batch 1: 100%|██████████| 10000/10000 [00:21<00:00, 460.89it/s] Batch 2: 100%|██████████| 10000/10000 [00:28<00:00, 349.16it/s] Batch 3: 100%|██████████| 10000/10000 [00:38<00:00, 263.12it/s] Batch 4: 100%|██████████| 10000/10000 [00:43<00:00, 228.76it/s] Batch 5: 100%|██████████| 10000/10000 [00:53<00:00, 187.44it/s] Batch 6: 100%|██████████| 10000/10000 [01:02<00:00, 159.92it/s] Batch 7: 100%|██████████| 10000/10000 [01:09<00:00, 144.79it/s] Batch 8: 100%|██████████| 10000/10000 [01:18<00:00, 127.59it/s] Batch 9: 100%|██████████| 10000/10000 [01:25<00:00, 116.92it/s] Batch 10: 100%|██████████| 10000/10000 [01:34<00:00, 105.96it/s] Batch 11: 100%|██████████| 10000/10000 [01:40<00:00, 99.81it/s] Batch 12: 100%|██████████| 10000/10000 [01:46<00:00, 93.92it/s] Batch 13: 100%|██████████| 10000/10000 [01:55<00:00, 86.49it/s] Batch 14: 100%|██████████| 10000/10000 [02:03<00:00, 80.92it/s] Batch 15: 100%|██████████| 10000/10000 [02:10<00:00, 76.46it/s] Batch 16: 100%|██████████| 10000/10000 [02:18<00:00, 71.99it/s] Batch 17: 100%|██████████| 10000/10000 [02:25<00:00, 68.69it/s] Batch 18: 100%|██████████| 10000/10000 [02:32<00:00, 65.57it/s] Batch 19: 100%|██████████| 10000/10000 [02:42<00:00, 61.53it/s] Batch 20: 100%|██████████| 10000/10000 [02:39<00:00, 62.84it/s]

1条回答

网友

1楼 · 发布于 2024-09-28 21:22:35

Pd.Concat价格昂贵->

在这里，您可以做什么-使用一个空列表并将行dict附加到该特定列表。最后，在所有操作之后，将输出转换回数据帧。这样会非常快：）

import timeit
import pandas as pd
from tqdm import tqdm


def some_generator():
    for _ in range(1_000_000):
        yield {
            'colA': 'valA',
            'colB': 'valA',
            'colC': 'valA',
            'colD': 'valA',
            'colE': 'valA',
            'colF': 'valA',
            'colG': 'valA',
            'colH': 'valA',
            'colI': 'valA',
            'colJ': 'valA'
        }


def main():
    batch_size = 10_000
    generator = some_generator()
    output = []
    batch_round = 1

    while True:

        for _ in tqdm(range(batch_size), desc=f"Batch {batch_round}"):

            try:
                row = next(generator)
                row.pop('colA')
                output.append(row)

            except for StopIteration:
                break

        shape = len(output)  
        if shape != batch_size * batch_round:
            break
        else:
            batch_round += 1
            

    # print(pd.DataFrame(output))

main()

输出-

Batch 1: 100%|██████████| 10000/10000 [00:00<00:00, 826724.48it/s]
Batch 2: 100%|██████████| 10000/10000 [00:00<00:00, 978765.55it/s]
Batch 3: 100%|██████████| 10000/10000 [00:00<00:00, 1072629.72it/s]
Batch 4: 100%|██████████| 10000/10000 [00:00<00:00, 1267237.90it/s]
Batch 5: 100%|██████████| 10000/10000 [00:00<00:00, 1351301.27it/s]
Batch 6: 100%|██████████| 10000/10000 [00:00<00:00, 1402918.02it/s]
Batch 7: 100%|██████████| 10000/10000 [00:00<00:00, 1374370.54it/s]
Batch 8: 100%|██████████| 10000/10000 [00:00<00:00, 1435520.57it/s]
Batch 9: 100%|██████████| 10000/10000 [00:00<00:00, 1499947.79it/s]
Batch 10: 100%|██████████| 10000/10000 [00:00<00:00, 1458381.08it/s]
Batch 11: 100%|██████████| 10000/10000 [00:00<00:00, 1366178.30it/s]
Batch 12: 100%|██████████| 10000/10000 [00:00<00:00, 1396844.17it/s]
Batch 13: 100%|██████████| 10000/10000 [00:00<00:00, 1376309.76it/s]
Batch 14: 100%|██████████| 10000/10000 [00:00<00:00, 1453881.94it/s]
Batch 15: 100%|██████████| 10000/10000 [00:00<00:00, 1373245.59it/s]
Batch 16: 100%|██████████| 10000/10000 [00:00<00:00, 1470756.72it/s]
Batch 17: 100%|██████████| 10000/10000 [00:00<00:00, 1450964.82it/s]
Batch 18: 100%|██████████| 10000/10000 [00:00<00:00, 1495882.16it/s]
Batch 19: 100%|██████████| 10000/10000 [00:00<00:00, 1477960.46it/s]
Batch 20: 100%|██████████| 10000/10000 [00:00<00:00, 1479733.29it/s]
Batch 21: 100%|██████████| 10000/10000 [00:00<00:00, 1383528.17it/s]
Batch 22: 100%|██████████| 10000/10000 [00:00<00:00, 1361521.78it/s]
Batch 23: 100%|██████████| 10000/10000 [00:00<00:00, 1420594.07it/s]
Batch 24: 100%|██████████| 10000/10000 [00:00<00:00, 1468850.99it/s]
Batch 25: 100%|██████████| 10000/10000 [00:00<00:00, 1477960.46it/s]
Batch 26: 100%|██████████| 10000/10000 [00:00<00:00, 1055755.13it/s]
Batch 27: 100%|██████████| 10000/10000 [00:00<00:00, 952104.06it/s]
Batch 28: 100%|██████████| 10000/10000 [00:00<00:00, 1260231.96it/s]
Batch 29: 100%|██████████| 10000/10000 [00:00<00:00, 1433705.01it/s]
Batch 30: 100%|██████████| 10000/10000 [00:00<00:00, 1404703.44it/s]

相关问题更多 >

编程相关推荐

热门问题

热门文章