Pandas:Groupby和Aggregate列1中的条件来自第2列

2024-09-26 17:40:40 发布

您现在位置：Python中文网/ 问答频道 /正文

1722

网友

男 | 程序猿一只，喜欢编程写python代码。

我正试图在一些项目中从R&dplyr转移到python和Pandas，我希望找出如何复制dplyr中常用的编码策略。在

一个常见的方法是，我将按一个特定的列分组，然后计算一个包含来自第三列的条件的派生列。下面是一个简单的例子：

dat = data.frame(user = rep(c("1",2,3,4),each=5),
           cancel_date = rep(c(12,5,10,11), each=5)
) %>%
  group_by(user) %>%
  mutate(login = sample(1:cancel_date[1], size = n(), replace = T)) %>%
  ungroup()

^{pr2}$

在这个数据框中，我想计算每个用户在取消前三个月的登录次数。在dplyr中，这很简单：

dat %>%
  group_by(user) %>%
  summarise(logins_three_mos_before_cancel = length(login[cancel_date-login>=3]))

  user logins_three_mos_before_cancel
1    1                              4
2    2                              1
3    3                              5
4    4                              3

但我对怎么做熊猫有点困惑。据我所知，aggregate只对给定的分组列应用函数，我不知道如何让它应用一个涉及多个列的函数。在

以下是熊猫的相同数据：

d = { 'user' : np.repeat([1,2,3,4],5),
     'cancel_date' : np.repeat([12,5,10,11],5),
     'login' : np.array([3,  9, 12,  4,  2,  4,  3,  5,  5,  1,  3,  5,  4,  6,  3,  3,  5, 10,  7, 10])
     }
pd.DataFrame(data=d)

Tags：数据 data date by np group login cancel

1条回答

网友

1楼 · 发布于 2024-09-26 17:40:40

我希望我听了你的话，但你是说这个吗？在

>> df[df.cancel_date - df.login >= 3].user.value_counts().sort_index()
1    4
2    1
3    5
4    3
dtype: int64

Pandas:Groupby和Aggregate列1中的条件来自第2列

相关问题更多 >

编程相关推荐

热门问题

热门文章

Pandas:Groupby和Aggregate列1中的条件来自第2列

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >