在组内保留列中具有相同值的行

2024-09-28 22:33:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧,例如:

Groups Names            Numbers Value 
G1     Canis_lupus1     10.0    NaN
G1     Cattus_cattus4   10.0    NaN 
G1     Homo_sapiens2    3.0     NaN
G1     Danio_rerio      1.0     NaN
G2     Canis_lupus2     10.0    0.3
G2     Cattus_cattus5   10.0    0.3
G3     Elaph_strangus2  2.0     NaN
G3     Elaph_strangus3  2.0     NaN 

我只想把GroupsNumbers值相同的NamesNaN值保持在Groups之内

因此,我应该得到:

Groups Names            Numbers Value 
G1     Canis_lupus1     10.0    NaN
G1     Cattus_cattus4   10.0    NaN 
G3     Elaph_strangus2  2.0     NaN
G3     Elaph_strangus3  2.0     NaN

有人有主意吗

以下是dict格式的数据帧(如果有帮助):

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G2', 5: 'G2', 6: 'G3', 7: 'G3'}, 'Names': {0: 'Canis_lupus1', 1: 'Cattus_cattus4', 2: 'Homo_sapiens2', 3: 'Danio_rerio1.0', 4: 'Canis_lupus2', 5: 'Cattus_cattus5', 6: 'Elaph_strangus2', 7: 'Elaph_strangus3'}, 'Numbers': {0: 10.0, 1: 10.0, 2: 3.0, 3: nan, 4: 10.0, 5: 10.0, 6: 2.0, 7: 2.0}, 'Value ': {0: nan, 1: 'NaN ', 2: nan, 3: nan, 4: '0.3', 5: '0.3', 6: nan, 7: 'NaN '}}

Tags: namesvaluenangroupsnumbersg1g2canis
2条回答
import pandas as pd


def qualified(df: pd.DataFrame):
    return df.duplicated(subset=['Numbers'], keep=False) & pd.isna(df.Value)


print(df[qualified(df)])

这应该行得通

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.duplicated.html

  • 创建布尔掩码以标识子集GroupsNumbers中的重复值
  • 创建另一个布尔掩码以标识Value列中的NaN
  • 使用逻辑and组合掩码并过滤行
df[df.duplicated(['Groups', 'Numbers'], keep=False) & df['Value'].isna()]

  Groups            Names  Numbers  Value
0     G1     Canis_lupus1     10.0    NaN
1     G1   Cattus_cattus4     10.0    NaN
6     G3  Elaph_strangus2      2.0    NaN
7     G3  Elaph_strangus3      2.0    NaN

相关问题 更多 >