pythonPandas任务

2024-09-26 17:47:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个这样的数据集

event_date | user_id | user_city | user_state |
06-09-2021 | 23      | Thane     | Maharashtra
04-09-2021 | 3224    | Madurai   | Tamil Nadu
02-08-2021 | 2331    | Ghaziabad | Utter Pradesh

使用python,我希望以这种格式输出

User ID | Date of      | Location of  |  Location of  | Location on Second | 
        | Last Logins  | Latest Login |  Max Logins   | Most Login         |
        |              |              |               |                    |
5       | 11-09-2021   |  Gurgaon     |  Meerut       |  Noida             |



Tags: of数据eventidcitydateloginlocation
1条回答
网友
1楼 · 发布于 2024-09-26 17:47:22

您可以尝试使用^{}^{},并在使用新列名应用^{}后返回^{}

import pandas as pd
df = pd.DataFrame([
    ['06-28-2021',23  ,'Thane','Maharashtra'],
    ['06-12-2021',23  ,'TEST','Maharashtra'],
    ['06-11-2021',23  ,'TEST','Maharashtra'],
    ['04-09-2021',3224,'Madurai','Tamil Nadu'],
    ['02-08-2021',2331,'Ghaziabad','Utter Pradesh']],
    columns=['event_date', 'user_id', 'user_city', 'user_state'])
df['event_date'] = pd.to_datetime(df['event_date'])
def func(g):
    last_row = g.iloc[-1]
    cities = g['user_city'].value_counts().nlargest(2).index
    cols = ['Date of last login','Location of Latest Logins','Location of Last Logins','Location on Second Most Login']
    return pd.Series((last_row['event_date'],last_row['user_city'],cities[0],cities[-1]), index=cols)
new_df = df.sort_values('event_date').groupby('user_id').apply(func)
print(new_df)
^{tb1}$

方法2

使用^{}获取最大日期,而无需对数据帧进行排序(即,可以避免使用df.sort_values('event_date')

注意:在某些情况下,sort_值会更快,以避免每次迭代中访问nlargest的开销

def func(g):
    last_login = df.iloc[g['event_date'].nlargest(1).index[0]]
    cities = g['user_city'].value_counts().nlargest(2).index
    cols = ['Date of last login','Location of Latest Logins','Location of Last Logins','Location on Second Most Login']
    return pd.Series((last_login['event_date'],last_login['user_city'],cities[0],cities[-1]), index=cols)

如果我的代码不符合您的预期结果,请通知我

相关问题 更多 >

    热门问题