如果行中不包含模式，请使用groupby删除GroupB

网友

1楼 · 编辑于 2024-10-05 13:46:43

解决这个问题还有很长的路要走，来说明groupby是如何工作的

首先创建一个函数，用于测试所需字符串：

def contains_str(x, string = '_Lh'):
    if string in x:
        return True
    else:
        return False

接下来，迭代您的组并应用此函数：

keep_dict = {}

for label, group_df in df.groupby('col1'):
    keep = group_df['col2'].apply(contains_str).any()
    keep_dict[label] = keep

print(keep_dict)
# {'G1': True, 'G2': False, 'G3': False, 'G4': True}

Feel free to print individual items in the operation to understand their role.

最后，将该词典映射到您当前的df:

df_final = df[df['col1'].map(keep_dict)].reset_index(drop=True)

    col1    col2
0   G1      OP2
1   G1      OP0
2   G1      OPP
3   G1      OPL_Lh
4   G4      TUI
5   G4      TYUI
6   G4      TR_Lh

您可以使用以下代码压缩这些步骤：

keep_dict = df.groupby('col1', as_index=True)['col2'].apply(lambda arr: any([contains_str(x) for x in arr])).to_dict()

print(keep_dict)
# {'G1': True, 'G2': False, 'G3': False, 'G4': True}

I hope this both answers your Q and explains what's taking place "behind the scenes" in groupby operations.

网友

2楼 · 编辑于 2024-10-05 13:46:43

你可以做：

filter_=df.loc[df["col2"].str.contains("_Lh"), "col1"].drop_duplicates()

df=df.merge(filter_, on="col1")

产出：

  col1    col2
0   G1     OP2
1   G1     OP0
2   G1     OPP
3   G1  OPL_Lh
4   G4     TUI
5   G4    TYUI
6   G4   TR_Lh

网友

3楼 · 编辑于 2024-10-05 13:46:43

IIUC

您可以使用布尔测试和isin在包含_Lh的组中进行筛选

m = df[df['col2'].str.contains('_Lh')]['col1']

df[df['col1'].isin(m)].groupby('col1')...

print(df[df['col1'].isin(m)])

   col1    col2
0    G1     OP2
1    G1     OP0
2    G1     OPP
3    G1  OPL_Lh
8    G4     TUI
9    G4    TYUI
10   G4   TR_Lh

相关问题更多 >

编程相关推荐

热门问题

热门文章

如果行中不包含模式，请使用groupby删除GroupB

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >