name action_1 action_2 action_3
bill referred referred
bob introduced referred referred
mary introduced
june introduced referred
dale referred
donna introduced
In [123]: molten = pd.melt(df, id_vars='name', var_name='last_referred')
In [124]: molten
Out[124]:
name last_referred value
0 bill action_1 referred
1 bob action_1 introduced
2 mary action_1 introduced
3 june action_1 introduced
4 dale action_1 referred
5 donna action_1 introduced
6 bill action_2 referred
7 bob action_2 referred
8 mary action_2 NaN
9 june action_2 referred
10 dale action_2 NaN
11 donna action_2 NaN
12 bill action_3 NaN
13 bob action_3 referred
14 mary action_3 NaN
15 june action_3 NaN
16 dale action_3 NaN
17 donna action_3 NaN
In [125]: gb = molten.groupby('name')
In [126]: col = gb.apply(lambda x: x[x.value == 'referred'].tail(1)).last_referred
In [127]: col.index = col.index.droplevel(1)
In [128]: col
Out[128]:
name
bill action_2
bob action_3
dale action_1
june action_2
Name: last_referred, dtype: object
In [129]: newdf = df.join(col, on='name')
In [130]: newdf
Out[130]:
name action_1 action_2 action_3 last_referred
0 bill referred referred NaN action_2
1 bob introduced referred referred action_3
2 mary introduced NaN NaN NaN
3 june introduced referred NaN action_2
4 dale referred NaN NaN action_1
5 donna introduced NaN NaN NaN
In [3]: def func(row, pattern):
referrer = np.nan
for key in row.index:
if row[key] == pattern:
referrer = key
return referrer
df['last_referred'] = df.apply(func, pattern='referred', axis=1)
df
Out[3]: name action_1 action_2 action_3 last_referred
0 bill referred referred None action_2
1 bob introduced referred referred action_3
2 mary introduced NaN
3 june introduced referred action_2
4 dale referred action_1
5 donna introduced NaN
您可以使用
pandas.melt
和groupby
完成此操作:矢量化方法,使用
arange
查找最后一个索引,max
,并进行连接:说明:
我们要在每一行中找到值为
^{pr2}$'referred'
的最右边的单元格:一个选项是^{} ,但这是第一个(即最左边的)出现。但是,假设我们可以用它们的列索引替换}是{},我们可以通过乘以垂直广播的整数范围
True
值,我们可以只使用普通的max
。由于True
是1
,而{[0, 1, 2, ...]
来实现这一点:不过,有一个问题:我们无法区分“name”列中的
'referred'
与根本不发生的区别。很容易修复;只需从1开始整数范围:现在只需使用此数组索引列名:
哦!我们需要使
0
以NaN
的形式出现,并将其余的列移动。我们可以使用np.r_
来实现这一点,它连接了数组:就在这里。在
只需沿着
axis=1
使用apply
函数,并将pattern
参数作为附加参数传递给函数。在相关问题 更多 >
编程相关推荐