如何用pandas重写Python中的SQL查询？

SELECT Id, MIN(CASE WHEN Type = 0001 THEN Date END) AS TransactionDate, MIN(CASE WHEN Type = 0002 THEN Date END) AS SignupDate, MIN(CASE WHEN Type = 0003 THEN Date END) AS SeenDate FROM Transactions WHERE State = 0 GROUP BY Id

1条回答

网友

1楼 · 发布于 2024-09-27 21:33:04

请考虑两种方法：先分配列，然后进行聚合；或者有条件地内联聚合，然后删除重复项：

方法1

transactionDate = (transactions.assign(TransactionDate = transactions['Date'].where(transactions['Type']=='0001'),
                                       SignupDate = transactions['Date'].where(transactions['Type']=='0002'),
                                       SeenDate = transactions['Date'].where(transactions['Type']=='0003')
                                       )
                               .groupby("Id")
                               .agg({'TransactionDate':'min', 'SignupDate':'min', 'SeenDate':'min'})    
                               .reindex(['TransactionDate', 'SignupDate', 'SeenDate'], axis='columns')
                   )

方法2

def agg_cols(row):
   row['TransactionDate'] = row[row["Type"] == "0001"]["Date"].min()
   row['SignupDate'] = row[row["Type"] == "0002"]["Date"].min()
   row['SeenDate'] = row[row["Type"] == "0003"]["Date"].min()

   return row

transactionDate = (transactions.groupby("Id")
                                .apply(agg_cols)
                                .reindex(['Id', 'TransactionDate', 'SignupDate', 'SeenDate'], axis='columns')
                                .drop_duplicates()
                                .sort_values('Id')
                                .reset_index(drop=True)
                   )

数据

用随机数据证明（为再现性设定种子）：

import numpy as np
import pandas as pd

np.random.seed(12102018)
transactions = pd.DataFrame({'Type': np.random.choice(['0001', '0002', '0003'], 344),
                             'Id': np.random.choice(['r', 'pandas', 'julia', 'sas', 'stata', 'spss'], 344),
                             'Date': pd.date_range('2018-01-01', '2018-12-10')
                            })

输出

方法1

#        TransactionDate SignupDate   SeenDate
# Id                                          
# julia       2018-01-13 2018-01-19 2018-05-02
# pandas      2018-01-08 2018-01-09 2018-01-29
# r           2018-01-03 2018-01-15 2018-01-02
# sas         2018-01-12 2018-01-22 2018-01-07
# spss        2018-01-05 2018-02-19 2018-01-01
# stata       2018-01-16 2018-01-30 2018-01-20

方法2

#        Id TransactionDate SignupDate   SeenDate
# 0   julia      2018-01-13 2018-01-19 2018-05-02
# 1  pandas      2018-01-08 2018-01-09 2018-01-29
# 2       r      2018-01-03 2018-01-15 2018-01-02
# 3     sas      2018-01-12 2018-01-22 2018-01-07
# 4    spss      2018-01-05 2018-02-19 2018-01-01
# 5   stata      2018-01-16 2018-01-30 2018-01-20

数据

输出

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何用pandas重写Python中的SQL查询？

数据

输出

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >