最快的方式添加列时，有相互依赖？

import pandas as pd import numpy as np #example data df = pd.DataFrame([[1, 22, 'Child', 8], [1, 62, 'Parent', 36], [2, 102, 'Child', 6], [2, 103, 'Child', 10], [2, 107, 'Parent', 40], [2, 108, 'Parent', 42]], columns=['FamilyId', 'UserId', 'Type', 'Age']) expected_result = pd.DataFrame([[1, 22, 'Child', 8, 36], [2, 102, 'Child', 6, 42], [2, 103, 'Child', 10, 42]], columns=['FamilyId', 'UserId', 'Type', 'Age', 'ParentAge'])

parents = df.query('Type=="Parent"') children = df.query('Type=="Child"') oldest_parents = parents.groupby('FamilyId') \ .apply(pd.DataFrame.nlargest, n=1, columns='Age') \ .reset_index(drop=True) \ .rename(columns={'Age': 'ParentAge'}) pd.merge(children, oldest_parents[['FamilyId', 'ParentAge']], on='FamilyId')

1条回答

网友

1楼 · 发布于 2024-10-02 18:23:03

选项1：尝试groupby().max()而不是apply：

df[df['Type'].eq('Child')].merge(df[df['Type'].eq('Parent')].groupby('FamilyId').Age.max(),
                                 on='FamilyId',
                                 suffixes=('','Parent'))

选项2：最快，假设最年长的父母也是家中最年长的：

df['Parent_Age'] = df.groupby('FamilyId').Age.transform('max')
df[df['Type'].eq('Child')]

选项3：更快，无需假设父母年龄最大（例如，授予父母）：

df['Parent_Age'] = (df['Age'].mul(df['Type'].eq('Parent'))
                             .groupby(df['FamilyId']).transform('max')
                   )
df[df['Type'].eq('Child')]

输出：

   FamilyId  UserId   Type  Age  AgeParent
0         1      22  Child    8         36
1         2     102  Child    6         42
2         2     103  Child   10         42

相关问题更多 >

编程相关推荐

热门问题

热门文章