如何使用列表和索引之间的比较删除列表中的项?

2024-10-01 11:34:18 发布

您现在位置:Python中文网/ 问答频道 /正文

这是我的数据框:

Cites_Dogs  Dog_Number
DOG45555    DOG123
DOG127      DOG123
DOG7760     DOG126
DOG45       DOG126
DOG559      DOG126
DOG760      DOG126
DOG123      DOG127
DOG789      DOG127
DOG860      DOG127

我已按以下代码转换为列表:

all_cites_dog = all_cites_dog.groupby('Dog_Number')['Cites_Dogs'].apply(list)

我想删除列表中与索引DOG123DOG126DOG127不匹配的项。你知道吗

DOG123   [ 'DOG45555' ,  'DOG127']
DOG126   [ 'DOG7760', 'DOG456' ,  'DOG559' ,  'DOG760']
DOG127   [ 'DOG123' ,  'DOG789' ,  'DOG860']

我希望看到这样的结果:

DOG123   [ 'DOG127']
DOG126   ['']
DOG127   [ 'DOG123']

我该怎么办?你知道吗


Tags: number列表alldogdogsdog789citesdog860
3条回答

groupby+apply中使用筛选:

idx = set(all_cites_dog['Dog_Number'])
all_cites_dog = (all_cites_dog.groupby('Dog_Number')['Cites_Dogs']
                             .apply(lambda x: list([y for y in x if y in idx])))

print (all_cites_dog)
Dog_Number
DOG123    [DOG127]
DOG126          []
DOG127    [DOG123]
Name: Cites_Dogs, dtype: object

为了获得更好的性能,首先按^{}^{}过滤,然后按groupby过滤,最后添加缺少的不匹配空值:

s = (all_cites_dog[all_cites_dog['Cites_Dogs'].isin(all_cites_dog['Dog_Number'].unique())]
             .groupby('Dog_Number')['Cites_Dogs']
             .apply(list))

idx = np.setdiff1d(all_cites_dog['Dog_Number'].unique(), s.index)
s1 = pd.Series([[]] * len(idx), index=idx)
print (s1)
DOG126    []
dtype: object

s = s.append(s1).sort_index()
print (s)
DOG123    [DOG127]
DOG126          []
DOG127    [DOG123]
dtype: object

您可以使用apply并使用列表理解来保留索引中的元素:

l = all_cites_dog.index
all_cites_dog.apply(lambda x: [i for i in x if i in l])

Dog_Number
DOG123    [DOG127]
DOG126          []
DOG127    [DOG123]
Name: Cites_Dogs, dtype: object

您可以按照以下步骤操作:

  1. 根据Cites_Dogs过滤数据帧。你知道吗
  2. list执行groupby+apply。你知道吗
  3. 根据唯一的狗号重新索引数据帧。你知道吗
  4. NaN值替换为空列表以保持一致性。你知道吗

下面是一个演示:

unq_dogs = df['Dog_Number'].unique()

res = df.loc[df['Cites_Dogs'].isin(unq_dogs]\
        .groupby('Dog_Number')['Cites_Dogs'].apply(list)\
        .reindex(unq_dogs)\
        .fillna(pd.Series([[] for _ in range(len(unq_dogs))], index=unq_dogs))\
        .reset_index()

print(res)

  Dog_Number Cites_Dogs
0     DOG123   [DOG127]
1     DOG126         []
2     DOG127   [DOG123]

相关问题 更多 >