使用Process函数时使Python多处理库工作的问题

2024-10-01 05:00:21 发布

您现在位置：Python中文网/ 问答频道 /正文

9965

网友

男 | 程序猿一只，喜欢编程写python代码。

我试图从公开的Reddit数据集中构建一个父/注释对列表

我有一个CSV文件，我把它加载到一个Pandas数据框中，这个数据框包含带有父和子id的注释行，以及子注释。使用以下代码块加载数据：

import os
import multiprocessing as mp
import numpy as np 
import pandas as pd

sourcePATH = r'C:\'
workingFILE = r'\output-pt1.csv'

# filepaths

input_file = sourcePATH + workingFILE

data_df = pd.read_csv(input_file,header=None,names=['PostIDX','ParentIDX','Comment','Score','Controversiality'])

目的是扫描数据帧中的每一行，并使用父id搜索数据帧的其余部分，以查看它们是否存在父注释。如果是的话，我将子注释和父注释与其他信息一起存储在一个元组中。这将被添加到一个列表中，然后在最后写入一个csv文件。为此，我使用以下代码：

def checkChildParent(ParentIDX_curr, ChildIDX_curr,ChildComment_curr,ChildScore_curr,ChildCont_curr):

    idx = data_df.loc[data_df['PostIDX'] == ParentIDX_curr]

    if idx.empty is False:
        ParentComment = idx.iloc[0,2]
        ParentScore = idx.iloc[0,3]
        ParentCont = idx.iloc[0,4]
        outPut.put([ParentIDX_curr[0], ParentComment,ParentScore,ParentCont,ChildIDX_curr[0], ChildComment_curr[0],ChildScore_curr[0],ChildCont_curr[0]])


if __name__ == '__main__':
    print('Process started')
    t_start_init = time.time()
    t_start = time.time()

    noCores = 1

    #pool = mp.Pool(processes=noCores)

    update_freq = 100
    n = 1000
    #n = round(len(data_df)/8)
    flag_create = 0
    flag_run = 0
    i = 0
    outPut = mp.Queue()

    #parent_child_df = pd.DataFrame()
    #parent_child_df.coumns = ['PostIDX','ParentIDX']


    while i < n:
        #print(i)
        procs = []
        ParentIDX = []
        ParentComment = []
        ParentScore = []
        ParentCont = []
        ChildIDX = []
        ChildComment = []
        ChildScore = []
        ChildCont = []

        for worker in range(0,noCores):
            ParentIDX.append(data_df.iloc[i,1])
            ChildIDX.append(data_df.iloc[i,0])
            ChildComment.append(data_df.iloc[i,2])
            ChildScore.append(data_df.iloc[i,3])
            ChildCont.append(data_df.iloc[i,4])
            i = i + 1

        #when I call the function this way it returns the expected matches
        #checkChildParent(ParentIDX,ChildIDX,ChildComment,
        #      ChildScore,ChildCont)


        #when I call the function with Process function nothing appears to be happening
        for proc in range(0,noCores):
            p = mp.Process(target = checkChildParent, args=(ParentIDX[proc],ChildIDX[proc],ChildComment[proc],ChildScore[proc],ChildCont[proc]))
            procs.append(p)
            p.start()

        #for p in procs:
        #    p.join()

        if outPut.empty() is False:
            print(outPut.get())

在文件的顶部有一个函数，它扫描dataframe中给定的行，如果找到匹配的父注释和子注释，则返回元组。如果我正常调用这个函数，那么它可以正常工作，但是当我使用Process函数调用这个函数时，它与任何东西都不匹配！。我猜传递给函数的参数就是传递给引起问题的函数的形式，但是我整个下午都在试着调试，到目前为止都失败了。如果有人有任何建议，请告诉我

谢谢

Tags：数据函数 import df data proc append idx

0条回答

目前没有回答

使用Process函数时使Python多处理库工作的问题

相关问题更多 >

编程相关推荐

热门问题

热门文章

使用Process函数时使Python多处理库工作的问题

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >