使用多重处理从数据帧写入csv,而不会弄乱输出

2024-09-27 23:17:50 发布

您现在位置:Python中文网/ 问答频道 /正文

import numpy as np
import pandas as pd
from multiprocessing import Pool
import threading

#Load the data
df = pd.read_csv('crsp_short.csv', low_memory=False)

def funk(date):
    ...
    # for each date in df.date.unique() do stuff which gives sample dataframe
    # as an output
    #then write it to file

    sample.to_csv('crsp_full.csv', mode='a')

def evaluation(f_list):
    with futures.ProcessPoolExecutor() as pool:
        return pool.map(funk, f_list)

# list_s is a list of dates I want to calculate function funk for   

evaluation(list_s)

我得到一个csv文件作为输出,其中一些行被弄乱了,因为python同时从不同的线程写入一些代码。我想我需要使用队列,但我无法修改代码使其正常工作。想法怎么做?否则需要很长时间才能得到结果。在


Tags: csvtosampleimportdffordatedef
1条回答
网友
1楼 · 发布于 2024-09-27 23:17:50

解决了问题(Pool为您排队)

Python: Writing to a single file with queue while using multiprocessing Pool

我的代码版本没有弄乱输出csv文件:

import numpy as np
import pandas as pd
from multiprocessing import Pool
import threading

#Load the data
df = pd.read_csv('crsp_short.csv', low_memory=False)

def funk(date):
    ...
    # for each date in df.date.unique() do stuff which gives sample dataframe
    # as an output

    return sample

# list_s is a list of dates I want to calculate function funk for   

def mp_handler():
# 28 is a number of processes I want to run
    p = multiprocessing.Pool(28)
    for result in p.imap(funk, list_s):
        result.to_csv('crsp_full.csv', mode='a')


if __name__=='__main__':
    mp_handler()

相关问题 更多 >

    热门问题