如何使用条件选择数据框中的前N列

RowID Basic1011 Basic2837 Lemon836 Car92_T Manf3953 Brat82 Basic383_T Jot112 ... 1 2 8 4 3 1 5 6 7 2 8 3 5 0 9 7 0 5

2条回答

网友

1楼 · 编辑于 2024-09-30 20:26:23

尝试类似的方法，您可能需要在一开始就使用列选择来确保正确过滤

# this gives you column names with Basic or _T anywhere in the column name.
unwanted = df.filter(regex='Basic|_T').columns.tolist()

# the tilda takes the opposite of the criteria, so no Basic or _T
dfn = df[df.columns[~df.columns.isin(unwanted)]]

#apply your filter
Top = 2
df_ranked = dfn.where(dfn.apply(lambda x: x.eq(x.nlargest(Top)), axis=1), 0)

#then merge dfn with df_ranked

网友

2楼 · 编辑于 2024-09-30 20:26:23

步骤1：您可以使用^{}和regex来过滤具有以下两个条件的列：

以“基本”、“强”或“强”开头
以“\u T”结尾

使用的正则表达式是r'(?:^Basic)|(?:_T$)'，其中：

(?: )是正则表达式的非捕获组。它用于临时分组

^是文本锚点的开始，用于指示文本的开始位置

Basic与文本Basic匹配（与^一起，此Basic必须位于列标签的开头）

|是or的正则表达式元字符

_T匹配文本_T

$是文本锚的结尾，用于指示文本位置的结尾（与列名末尾的_T、_T$一起指示_T）

我们将这些列命名为cols_Basic_T

步骤2：然后，使用^{}查找其他列。我们将这些其他列命名为cols_others。

第3步：然后，我们对这些选定列上的所有列col_others应用类似的代码，该代码用于为您提供top N

全套代码：

## Step 1
cols_Basic_T = df.filter(regex=r'(?:^Basic)|(?:_T$)').columns

## Step 2
cols_others = df.columns.difference(cols_Basic_T)

## Step 3
#Top = 20 
Top = 3     # use fewer columns here for smaller sample data here
df_others = df[cols_others].where(df[cols_others].apply(lambda x: x.eq(x.nlargest(Top)), axis=1), 0)
# To keep the original column sequence
df_others = df_others[df.columns.intersection(cols_others)]

结果：

cols\u Basic\u T

print(cols_Basic_T)

Index(['Basic1011', 'Basic2837', 'Car92_T', 'Basic383_T'], dtype='object')

cols\u其他人

print(cols_others)

Index(['Brat82', 'Jot112', 'Lemon836', 'Manf3953', 'RowID'], dtype='object')

df_其他人

print(df_others)

## With Top 3 shown as non-zeros. Other non-Top3 masked as zeros

   RowID  Lemon836  Manf3953  Brat82  Jot112
0      0         4         0       5       7
1      0         0         9       7       5

相关问题更多 >

编程相关推荐

热门问题

热门文章