在每个列中查找最后一列匹配的模式问题的回答

在每个列中查找最后一列匹配的模式

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

矢量化方法，使用<code>arange</code>查找最后一个索引，<code>max</code>，并进行连接： <pre><code>df['last_referred'] = np.r_[[np.NaN], df.columns][ ((df == 'referred') * (np.arange(df.shape[1]) + 1)).max(axis=1).values] </code></pre> <hr/> 说明： 我们要在每一行中找到值为<code>'referred'</code>的最右边的单元格： ^{pr2}$ 一个选项是<a href="http://pandas.pydata.org/pandas-docs/dev/generated/pandas.DataFrame.idxmax.html#pandas.DataFrame.idxmax" rel="nofollow">^{<cd4>}</a>，但这是第一个（即最左边的）出现。但是，假设我们可以用它们的列索引替换<code>True</code>值，我们可以只使用普通的<code>max</code>。由于<code>True</code>是<code>1</code>，而{<cd9>}是{<cd10>}，我们可以通过乘以垂直广播的整数范围<code>[0, 1, 2, ...]</code>来实现这一点： <pre><code>>>> np.arange(df.shape[1]) array([0, 1, 2, 3]) >>> (df == 'referred') * np.arange(df.shape[1]) name action_1 action_2 action_3 0 0 1 2 0 1 0 0 2 3 2 0 0 0 0 3 0 0 2 0 4 0 1 0 0 5 0 0 0 0 >>> ((df == 'referred') * np.arange(df.shape[1])).max(axis=1) 0 2 1 3 2 0 3 2 4 1 5 0 dtype: int32 </code></pre> 不过，有一个问题：我们无法区分“name”列中的<code>'referred'</code>与根本不发生的区别。很容易修复；只需从1开始整数范围： <pre><code>>>> ((df == 'referred') * (np.arange(df.shape[1]) + 1)).max(axis=1) 0 3 1 4 2 0 3 3 4 2 5 0 dtype: int32 </code></pre> 现在只需使用此数组索引列名： <pre><code>>>> df.columns[((df == 'referred') * (np.arange(df.shape[1]) + 1)).max(axis=1).values] IndexError: index 4 is out of bounds for size 4 </code></pre> 哦！我们需要使<code>0</code>以<code>NaN</code>的形式出现，并将其余的列移动。我们可以使用<code>np.r_</code>来实现这一点，它连接了数组： <pre><code>>>> np.r_[[np.NaN], df.columns] array([nan, 'name', 'action_1', 'action_2', 'action_3'], dtype=object) >>> np.r_[[np.NaN], df.columns][ ((df == 'referred') * (np.arange(df.shape[1]) + 1)).max(axis=1).values] array(['action_2', 'action_3', nan, 'action_2', 'action_1', nan], dtype=object) </code></pre> 就在这里。在

在每个列中查找最后一列匹配的模式

1 个回答

相关Python问题