列标题前缀上的GroupBy列

df = pd.DataFrame([[1,2,3,4],[1,2,3,4],[1,2,3,4],[1,2,3,4]], columns=['abc', 'abd', 'wxy', 'wxz']) prefixes = ['ab','wx'] df abc abd wxy wxz 0 1 2 3 4 1 1 2 3 4 2 1 2 3 4 3 1 2 3 4

2条回答

网友

1楼 · 编辑于 2024-05-20 00:04:34

首先，有必要确定哪些列包含什么前缀。然后我们使用它来执行groupby。在

grouper = [next(p for p in prefixes if p in c) for c in df.columns]
u = df.groupby(grouper, axis=1).sum()

   ab  wx
0   3   7
1   3   7
2   3   7
3   3   7

快到了，现在

^{pr2}$

另一个选项是使用np.char.startswith和argmax来矢量化：

idx = np.char.startswith(
    df.columns.values[:, None].astype(str), prefixes).argmax(1)

(pd.Series(df.groupby(idx, axis=1).sum().sum().values, index=prefixes)
   .to_frame()
   .transpose())

   ab  wx
0  12  28

网友

2楼 · 编辑于 2024-05-20 00:04:34

在对列切片后使用groupby

df.groupby(df.columns.str[:-1],axis=1).sum().sum().to_frame().T
Out[317]: 
   ab  wx
0  12  28

更新

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

列标题前缀上的GroupBy列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >