另一列中每次出现的一列的总和

2024-06-25 23:38:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我似乎找不到搜索这个问题的正确措辞,因为我得到的答案非常相似,但并不正确

我正忙于泰坦尼克号的数据集,想计算一个家庭中幸存成员的总数。数据集如下所示:

+-------------+----------+-----------+-------------+ | PassengerId | Survived | Surname | NumSurvived | +-------------+----------+-----------+-------------+ | 1 | 0 | Braund | | | 2 | 1 | Cumings | | | 3 | 1 | Heikkinen | | | 4 | 1 | Futrelle | | | 5 | 0 | Braund | | | 6 | 0 | Moran | | | 7 | 0 | Futrelle | | | 8 | 0 | Braund | | | 9 | 1 | Cumings | | +-------------+----------+-----------+-------------+

我需要对numsurvive列中每个姓氏的幸存值求和,如下所示:

+-------------+----------+-----------+-------------+ | PassengerId | Survived | Surname | NumSurvived | +-------------+----------+-----------+-------------+ | 1 | 0 | Braund | 0 | | 2 | 1 | Cumings | 2 | | 3 | 1 | Heikkinen | 1 | | 4 | 1 | Futrelle | 1 | | 5 | 0 | Braund | 0 | | 6 | 0 | Moran | 0 | | 7 | 0 | Futrelle | 1 | | 8 | 0 | Braund | 0 | | 9 | 1 | Cumings | 2 | +-------------+----------+-----------+-------------+


Tags: 数据答案家庭成员surname总数moran措辞
1条回答
网友
1楼 · 发布于 2024-06-25 23:38:28

尝试:

df['NumSurvived']=df.groupby('Surname')['Survived'].transform(lambda x: x.eq(1).sum())

打印(df)

   PassengerId  Survived    Surname  NumSurvived
0            1         0     Braund            0
1            2         1    Cumings            2
2            3         1  Heikkinen            1
3            4         1   Futrelle            1
4            5         0     Braund            0
5            6         0      Moran            0
6            7         0   Futrelle            1
7            8         0     Braund            0
8            9         1    Cumings            2

相关问题 更多 >