日期不正确

Customer_ID | Transaction_ID | Item_ID ABC 2017-04-12-333 X8973 ABC 2017-04-12-333 X2468 ABC 2017-05-22-658 X2906 ABC 2017-05-22-757 X8790 ABC 2017-07-13-864 X8790 BCD 2017-08-11-879 X2346 BCD 2017-08-11-879 X2468

#get the date out of the Transaction_ID string df['date'] = pd.to_datetime(df.Transaction_ID.str[:10]) #calculate the transaction number df['trans_nr'] = df.groupby(['Customer_ID',"Transaction_ID", df['date'].dt.year]).cumcount()+1

Customer_ID | Transaction_ID | Item_ID | date | trans_nr ABC 2017-04-12-333 X8973 2017-04-12 1 ABC 2017-04-12-333 X2468 2017-04-12 2 ABC 2017-05-22-658 X2906 2017-05-22 1 ABC 2017-05-22-757 X8790 2017-05-22 1 ABC 2017-07-13-864 X8790 2017-07-13 1 BCD 2017-08-11-879 X2346 2017-08-11 1 BCD 2017-08-11-879 X2468 2017-08-11 2

Customer_ID | Transaction_ID | Item_ID | date | trans_nr ABC 2017-04-12-333 X8973 2017-04-12 1 ABC 2017-04-12-333 X2468 2017-04-12 1 ABC 2017-05-22-658 X2906 2017-05-22 2 ABC 2017-05-22-757 X8790 2017-05-22 2 ABC 2017-07-13-864 X8790 2017-07-13 3 BCD 2017-08-11-879 X2346 2017-08-11 1 BCD 2017-08-11-879 X2468 2017-08-11 1

3条回答

网友

1楼 · 编辑于 2024-06-28 11:43:32

一种方法是在进行累积计数之前删除重复值：

trans_nr = (df
        .drop_duplicates(subset=['Customer_ID', 'date'])
        .set_index(['Customer_ID', 'date'])
        .groupby(level='Customer_ID')
        .cumcount() + 1
    )
df.set_index(['Customer_ID', 'date'], inplace=True)
df['trans_nr'] = trans_nr
df.reset_index(inplace=True)

要获取事务号，首先删除具有重复Customer_ID和date值的行。然后使用Customer_ID和date（稍后合并）设置它们的索引，并执行groupby和cumcount。这将生成一个系列，其值是每个Customer_ID和date的累积计数。你知道吗

还可以为原始数据帧设置索引（同样允许合并）。然后您只需将trans_nr序列分配给df中的一列。索引负责合并逻辑。你知道吗

网友

2楼 · 编辑于 2024-06-28 11:43:32

让我们试试：

df['trans_nr'] = df.groupby(['Customer_ID', df['date'].dt.year])['date']\
                   .transform(lambda x: (x.diff() != pd.Timedelta('0 days')).cumsum())

输出：

 Customer_ID  Transaction_ID Item_ID       date  trans_nr
0         ABC  2017-04-12-333   X8973 2017-04-12         1
1         ABC  2017-04-12-333   X2468 2017-04-12         1
2         ABC  2017-05-22-658   X2906 2017-05-22         2
3         ABC  2017-05-22-757   X8790 2017-05-22         2
4         ABC  2017-07-13-864   X8790 2017-07-13         3
5         BCD  2017-08-11-879   X2346 2017-08-11         1
6         BCD  2017-08-11-879   X2468 2017-08-11         1

网友

3楼 · 编辑于 2024-06-28 11:43:32

将dual groupby与ngroup()一起使用，即

df['trans_nr'] = df.groupby('Customer_ID').apply(lambda x : \
                x.groupby([x['date'].dt.date]).ngroup()+1).values

 Customer_ID  Transaction_ID Item_ID       date  trans_nr
0         ABC  2017-04-12-333   X8973 2017-04-12         1
1         ABC  2017-04-12-333   X2468 2017-04-12         1
2         ABC  2017-05-22-658   X2906 2017-05-22         2
3         ABC  2017-05-22-757   X8790 2017-05-22         2
4         ABC  2017-07-13-864   X8790 2017-07-13         3
5         BCD  2017-08-11-879   X2346 2017-08-11         1
6         BCD  2017-08-11-879   X2468 2017-08-11         1

相关问题更多 >

编程相关推荐

热门问题

热门文章