按行合并具有相同id的列表

2024-09-29 22:24:04 发布

您现在位置:Python中文网/ 问答频道 /正文

如何在pandas中按行连接列表类型列?例如,见下文-

以前,

1  a  [a,b,c]  
1  b  [a,d] 

之后,

1  b  [a,b,c,d]

我做了如下列式列表连接

df['all_poi'] = df['poi_part1'] + df['poi_part2']

电流输出

location_id  city            all_poi
6265981     Port Severn     [Mount St. Louis Moonstone , Horseshoe Valley , Lake Muskoka]
6265981     Port Severn     [Mount St. Louis Moonstone ,  Little Lake Park , Bamboo Spa , Lake Huron]

预期产出

location_id    city             all_poi
6265981     Port Severn     [Mount St. Louis Moonstone , Horseshoe Valley , Lake Muskoka, Little Lake Park , Bamboo Spa , Lake Huron]

检查基于位置id合并的所有poi值


Tags: idcitydf列表portlocationallpoi
3条回答

您可以在^{}中的自定义函数中创建集合:

f = lambda x: list(set(z for y in x for z in y))
df = df.groupby(['location_id', 'city'])['all_poi'].agg(f).reset_index()
print (df)
  location_id    city                                            all_poi
0        Port  Severn  [Bamboo Spa, Mount St.Louis Moonstone, Lake Hu...

如果顺序和性能很重要,请使用dict删除重复项:

f = lambda x: list(dict.fromkeys([z for y in x for z in y]).keys())

另一个想法是使用unique

f = lambda x: pd.unique([z for y in x for z in y]).tolist()

编辑:

如果有多个列并且每个组需要第一个值:

df.groupby('location_id').agg({'city': 'first', 'all_poi': f}).reset_index()

如果需要其他一些聚合方法,如summeanjoin

df.groupby('location_id').agg({'city': 'first', 
                               'all_poi': f, 
                               'cols1':'sum', 
                               'vals': ','.join, 
                               'vals1': lambda x: list(x)}).reset_index()

简单的sum()怎么样:

res=df.groupby(["location_id"], as_index=False).agg({"city": "last", "all_poi": "sum"})
res["all_poi"]=res["all_poi"].map(set)

产出:

Before
   location_id  ...                                                                all_poi
0  6265981      ...  [Mount St. Louis Moonstone, Horseshoe Valley, Lake Muskoka]
1  6265981      ...  [Mount St. Louis Moonstone, Little Lake Park, Bamboo Spa, Lake Huron]

After:
   location_id  ...                                                                                                all_poi
0  6265981      ...  {Horseshoe Valley, Lake Muskoka, Lake Huron, Bamboo Spa, Little Lake Park, Mount St. Louis Moonstone}

看起来下面的答案更简洁,但是您可以将sum与groupby一起应用来组合列表。然后创建一个集合以消除重复项,并从set转换为list

import pandas as pd

df = pd.DataFrame([['1' ,'New York', ['a','b','c']], ['1', 'New York', ['a','d']]],
                   columns = ['location_id', 'city','all_poi'])

df.groupby(('location_id'))['all_poi'].apply(sum).apply(set).apply(list)

相关问题 更多 >

    热门问题