pythonPandas任务

1条回答

网友

1楼 · 发布于 2024-09-26 17:47:22

您可以尝试使用^{}和^{}，并在使用新列名应用^{}后返回^{}

import pandas as pd
df = pd.DataFrame([
    ['06-28-2021',23  ,'Thane','Maharashtra'],
    ['06-12-2021',23  ,'TEST','Maharashtra'],
    ['06-11-2021',23  ,'TEST','Maharashtra'],
    ['04-09-2021',3224,'Madurai','Tamil Nadu'],
    ['02-08-2021',2331,'Ghaziabad','Utter Pradesh']],
    columns=['event_date', 'user_id', 'user_city', 'user_state'])
df['event_date'] = pd.to_datetime(df['event_date'])
def func(g):
    last_row = g.iloc[-1]
    cities = g['user_city'].value_counts().nlargest(2).index
    cols = ['Date of last login','Location of Latest Logins','Location of Last Logins','Location on Second Most Login']
    return pd.Series((last_row['event_date'],last_row['user_city'],cities[0],cities[-1]), index=cols)
new_df = df.sort_values('event_date').groupby('user_id').apply(func)
print(new_df)

^{tb1}$

方法2

使用^{}获取最大日期，而无需对数据帧进行排序（即，可以避免使用df.sort_values('event_date')）

注意：在某些情况下，sort_值会更快，以避免每次迭代中访问nlargest的开销

def func(g):
    last_login = df.iloc[g['event_date'].nlargest(1).index[0]]
    cities = g['user_city'].value_counts().nlargest(2).index
    cols = ['Date of last login','Location of Latest Logins','Location of Last Logins','Location on Second Most Login']
    return pd.Series((last_login['event_date'],last_login['user_city'],cities[0],cities[-1]), index=cols)

如果我的代码不符合您的预期结果，请通知我

相关问题更多 >

编程相关推荐

热门问题

热门文章

pythonPandas任务

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >