在保持日期列不变的情况下执行此操作

2024-09-30 03:25:31 发布

您现在位置:Python中文网/ 问答频道 /正文

user = {'id':['abab23', 'abab21', 'abab22', 'abab25', 'abab24', 'abab30', 'abab252', 'abab15'],
        'dob':['10-10-1990','1-12-1993', '12-12-2000', '2-10-1999', '2-10-1999', '2-10-1999', '2-10-1999', '2-10-1999']}
                
                
activities = {'sentconn':['abab35', 'abab15', 'abab25', 'abab23','abab22', 'abab15'],
             'receiveconj': ['abab24', 'abab24', 'abab21', 'abab35', 'abab252', 'abab30'],
              'sentdate':['2-10-2020', '2-10-2020','4-10-2020', '5-10-2020', '10-10-2020', '11-10-2020'],
               'receivedDate':['2-10-2020', '2-10-2020','4-10-2020', '5-10-2020', '10-10-2020', '11-10-2020']}
                             
        
    user = pd.DataFrame(user)
    activities = pd.DataFrame(activities)
    
    
    sentconn    receiverconn    sentdate    receivedDate
    0   abab35  abab24         2-10-2020    2-10-2020
    1   abab15  abab24         2-10-2020    2-10-2020
    2   abab25  abab21         4-10-2020    4-10-2020
    3   abab23  abab35         5-10-2020    5-10-2020
    4   abab22  abab252       10-10-2020    10-10-2020
    5   abab15  abab30        11-10-2020    11-10-2020

    id       dob
0   abab23  10-10-1990
1   abab21  1-12-1993
2   abab22  12-12-2000
3   abab25  2-10-1999
4   abab24  2-10-1999
5   abab30  2-10-1999
6   abab252 2-10-1999
7   abab15  2-10-1999

我想检查的是,用户数据帧中的“id”是否按日期发送或接收了多少次连接请求

date         id      sent_connection   receivedconnection
2-10-2020   abab15    1                     0
            abab24    0                     2

4-10-2020   abab25    1                     0
            abab21    0                     1
            abab23    1                     0
            abab23    0                     1

像这样的东西正是我想要的


Tags: idactivitiesuserdobabab35abab23abab21abab25
2条回答

使用:

#seelct only necessary columns
activities = activities[['sentconn','receiveconj','sentdate','receivedDate']]

#set new columns names
activities.columns = ['sent_id','receive_id','sent_date','receive_date']

#ssplit columns names by _ to MultiIndex
activities.columns = activities.columns.str.split('_', expand=True)

#reshape DataFrame and filter by is with id in inner merge
activities = (activities.stack(0)
                        .rename_axis([None, 'type'])
                        .reset_index(level=1)
                        .merge(user['id']))
print (activities)
      type        date       id
0  receive   2-10-2020   abab24
1  receive   2-10-2020   abab24
2     sent   2-10-2020   abab15
3     sent  11-10-2020   abab15
4  receive   4-10-2020   abab21
5     sent   4-10-2020   abab25
6     sent   5-10-2020   abab23
7  receive  10-10-2020  abab252
8     sent  10-10-2020   abab22
9  receive  11-10-2020   abab30

#get counts by crosstab
df = pd.crosstab([activities['date'], activities['id']], activities['type'])
print (df)
type                receive  sent
date       id                    
10-10-2020 abab22         0     1
           abab252        1     0
11-10-2020 abab15         0     1
           abab30         1     0
2-10-2020  abab15         0     1
           abab24         2     0
4-10-2020  abab21         1     0
           abab25         0     1
5-10-2020  abab23         0     1

试试这个:

activities = {'sentconn':['abab35', 'abab15', 'abab25', 'abab23','abab22', 'abab15'],
             'receiveconn': ['abab24', 'abab24', 'abab21', 'abab35', 'abab252', 'abab30'],
              'sentdate':['2-10-2020', '2-10-2020','4-10-2020', '5-10-2020', '10-10-2020', '11-10-2020'],
               'receivedDate':['2-10-2020', '2-10-2020','4-10-2020', '5-10-2020', '10-10-2020', '11-10-2020']}

user = {'id':['abab23', 'abab21', 'abab22', 'abab25', 'abab24', 'abab30', 'abab252', 'abab15'],
        'dob':['10-10-1990','1-12-1993', '12-12-2000', '2-10-1999', '2-10-1999', '2-10-1999', '2-10-1999', '2-10-1999']}

usr_df = pd.DataFrame(user)
df = pd.DataFrame(activities)

#group by the required columns to get the count.
df1 = df.groupby(['sentdate','sentconn']).agg({'sentconn':'count'})
df2 = df.groupby(['receivedDate','receiveconn']).agg({'receiveconn':'count'})

#rename the axis so that you get common columns to concat
df1 = df1.rename_axis(['date','user'])
df2 = df2.rename_axis(['date','user'])

df = pd.concat([df1, df2],axis=1)\
        .fillna(0)\
        .reset_index()
#filter the user id not present is user df as required.
df = df.loc[df['user'].isin(usr_df['id'])]\
        .set_index(['date','user'])
print(df)

产出:

                   sentconn  receiveconn
date       user                          
10-10-2020 abab22        1.0          0.0
           abab252       0.0          1.0
11-10-2020 abab15        1.0          0.0
           abab30        0.0          1.0
2-10-2020  abab15        1.0          0.0
           abab24        0.0          2.0
4-10-2020  abab21        0.0          1.0
           abab25        1.0          0.0
5-10-2020  abab23        1.0          0.0

相关问题 更多 >

    热门问题