如何使用条件选择数据框中的前N列问题的回答

如何使用条件选择数据框中的前N列

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

步骤1：您可以使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.filter.html" rel="nofollow noreferrer">^{<cd1>}</a>和regex来过滤具有以下两个条件的列： <ol> <li>以“基本”、“强”或“强”开头</li> <li>以“\u T”结尾</li> </ol> 使用的正则表达式是<code>r'(?:^Basic)|(?:_T$)'</code>，其中： <code>(?: )</code>是正则表达式的非捕获组。它用于临时分组 <code>^</code>是文本锚点的开始，用于指示文本的开始位置 <code>Basic</code>与文本<code>Basic</code>匹配（与<code>^</code>一起，此<code>Basic</code>必须位于列标签的开头） <code>|</code>是<code>or</code>的正则表达式元字符 <code>_T</code>匹配文本<code>_T</code> <code>$</code>是文本锚的结尾，用于指示文本位置的结尾（与列名末尾的<code>_T</code>、<code>_T$</code>一起指示<code>_T</code>） 我们将这些列命名为<code>cols_Basic_T</code> 步骤2：然后，使用<a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Index.difference.html" rel="nofollow noreferrer">^{<cd18>}</a>查找其他列。我们将这些其他列命名为<code>cols_others</code>。 第3步：然后，我们对这些选定列上的所有列<code>col_others</code>应用类似的代码，该代码用于为您提供top N 全套代码： <pre><code>## Step 1 cols_Basic_T = df.filter(regex=r'(?:^Basic)|(?:_T$)').columns ## Step 2 cols_others = df.columns.difference(cols_Basic_T) ## Step 3 #Top = 20 Top = 3 # use fewer columns here for smaller sample data here df_others = df[cols_others].where(df[cols_others].apply(lambda x: x.eq(x.nlargest(Top)), axis=1), 0) # To keep the original column sequence df_others = df_others[df.columns.intersection(cols_others)] </code></pre> 结果： cols\u Basic\u T <pre><code>print(cols_Basic_T) Index(['Basic1011', 'Basic2837', 'Car92_T', 'Basic383_T'], dtype='object') </code></pre> cols\u其他人 <pre><code>print(cols_others) Index(['Brat82', 'Jot112', 'Lemon836', 'Manf3953', 'RowID'], dtype='object') </code></pre> df_其他人 <pre><code>print(df_others) ## With Top 3 shown as non-zeros. Other non-Top3 masked as zeros RowID Lemon836 Manf3953 Brat82 Jot112 0 0 4 0 5 7 1 0 0 9 7 5 </code></pre>

如何使用条件选择数据框中的前N列

1 个回答

相关Python问题