将唯一值从一个数据帧映射到另一个数据帧的最快方法是什么？

2024-09-26 22:53:54 发布

您现在位置：Python中文网/ 问答频道 /正文

1738

网友

男 | 程序猿一只，喜欢编程写python代码。

我正试图根据match_df列中列表中的值，将唯一值从一个数据帧（df1）中的列映射到另一个（match_df）中的新列

数据：

df1有1000万行，列：['ID', 'match1', 'match2']
match_df有10000行，是分组在“match1”上的数据帧，用于提供“match2”和“id”的唯一值和计数。它有以下列：['match1', 'match2_unique', 'match2_count', 'ID_unique', 'ID_count']

我想在一个新列match_df['match2_ids']中为链接到df1中“match2”的所有ID创建一个列表

下面的代码执行此任务，但需要一个多小时才能运行，match_df是600万行数据帧的子集。最终，我希望能够在600万美元的基础上执行该功能，但计算能力目前还不允许

def map_IDs(x):
    return list(df1[df1['match2'].isin(list(x))].ID.unique())

match_df['match2_ids'] = match_df['match2'].apply(lambda x: map_IDs(x))

任何帮助都将不胜感激

编辑：添加了示例

example = {
    'ID': [1,2,3,4,5,6,7,8,9,10],
    'match1': ['a', 'a', 'b', 'b', 'c', 'c', 'c', 'a', 'a', 'd'],
    'match2': ['a1', 'a2', 'b1', 'b1', 'c1', 'c1', 'c1', 'a1', 'a1', 'a1']
}

df1 = pd.DataFrame(example)
match_df = df1.groupby(['match1']).agg({
     'match2': ['unique', 'nunique'],
     'ID': ['unique', 'count']
}).reset_index()
match_df.columns = match_df.columns.map(''.join)

中间分组数据帧：

|match1|match2unique|match2nunique|IDunique    |IDcount|
|:-----|:-----------|:-----------:|:-----------|:-----:|
| 'a'  |['a1', 'a2']|  2          |[1, 2, 8, 9]|   4   |
| 'b'  |['b1']      |  1          |[3, 4]      |   2   |
| 'c'  |['c1']      |  1          |[5, 6, 7]   |   3   |
| 'd'  |['a1']      |  1          |[10]        |   1   |

映射功能：

match_df_final['match2_IDs'] = match_df.match2unique.apply(lambda x:
    list(df1[df1['match2'].isin(list(x))].ID.unique())
)

最终解决方案：

|match1|match2unique|match2nunique|IDunique    |IDcount|match2_IDs      |
|:-----|:-----------|:-----------:|:-----------|:-----:|:--------------:|
| 'a'  |['a1', 'a2']|  2          |[1, 2, 8, 9]|   4   |[1, 2, 8, 9, 10]|
| 'b'  |['b1']      |  1          |[3, 4]      |   2   |[3, 4]          |
| 'c'  |['c1']      |  1          |[5, 6, 7]   |   3   |[5, 6, 7]       |
| 'd'  |['a1']      |  1          |[10]        |   1   |[1, 8, 9, 10]   |

Tags：数据 id ids map df a1 match count

0条回答

目前没有回答

将唯一值从一个数据帧映射到另一个数据帧的最快方法是什么？

相关问题更多 >

编程相关推荐

热门问题

热门文章

将唯一值从一个数据帧映射到另一个数据帧的最快方法是什么？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >