加快Pandas的加工速度

2024-09-29 21:33:45 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在做一些比较，一个dataframe与垂直处理中一列上的其他3个，我想知道这个过程是否可能使用更多的内核/使其更快？我试过concurrent.futures.ProcessPoolExecutor()，但实际上慢了1秒。。。这是我的密码

       # df_out is main DataFrame, hikari_data_df, kokyaku_data_df, hikanshou_data_df are DF to compare 
        m1 = df_out[self.col_name_].isin(hikari_data_df['phone_num1'])
        m2 = df_out[self.col_name_].isin(hikari_data_df['phone_num2'])
        # Add new column to df_out on place of matching m1 with df_out col
        df_out['new1'] = df_out[self.col_name_].where(m1)
        df_out['new2'] = df_out[self.col_name_].where(m2)

        m1 = df_out[self.col_name_].isin(kokyaku_data_df['phone_number1'])
        m2 = df_out[self.col_name_].isin(kokyaku_data_df['phone_number2'])
        df_out['new3'] = df_out[self.col_name_].where(m1)
        df_out['new4'] = df_out[self.col_name_].where(m2)

        m1 = df_out[self.col_name_].isin(hikanshou_data_df['phone_number'])
        df_out['new5'] = df_out[self.col_name_].where(m1)


        df_out.to_csv(sys.argv[1], index=False)

我希望这个过程更快

Tags： to name self df data 过程 phone col

1条回答

网友

1楼 · 发布于 2024-09-29 21:33:45

首先，如果你的数据不大。尝试将“isin”/“where”函数转换为向量操作，如“join/merge”。这将消耗更多内存，但速度要快得多

第二，使用dask。但是，要小心。如果你的数据不够大。达斯克会慢一点

加快Pandas的加工速度

相关问题更多 >

编程相关推荐

热门问题

热门文章

加快Pandas的加工速度

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >