在特定日期范围内合并数据帧

2024-06-28 10:57:02 发布

您现在位置:Python中文网/ 问答频道 /正文

我有两个数据帧。一个用于Invoice Data,另一个用于Promotion Dates

Invoice Data数据帧

+------------+-------------+---------------+
| LocationNo | InvoiceDate | InvoiceAmount |
+------------+-------------+---------------+
|      A     |  01-Jul-20  |       79      |
+------------+-------------+---------------+
|      B     |  01-Jul-20  |       72      |
+------------+-------------+---------------+
|      C     |  01-Jul-20  |       24      |
+------------+-------------+---------------+
|      A     |  02-Jul-20  |       68      |
+------------+-------------+---------------+
|      B     |  02-Jul-20  |       6       |
+------------+-------------+---------------+
|      C     |  02-Jul-20  |       27      |
+------------+-------------+---------------+
|      A     |  03-Jul-20  |       25      |
+------------+-------------+---------------+
|      B     |  03-Jul-20  |       62      |
+------------+-------------+---------------+
|      C     |  03-Jul-20  |       58      |
+------------+-------------+---------------+
|      D     |  03-Jul-20  |       36      |
+------------+-------------+---------------+
|      E     |  03-Jul-20  |       65      |
+------------+-------------+---------------+
|      F     |  03-Jul-20  |       81      |
+------------+-------------+---------------+

df_1 = pd.DataFrame({
    'LocationNo':['A','B','C','A','B','C','A','B','C','D','E','F'],
    'InvoiceDate':['01-Jul-20','01-Jul-20','01-Jul-20','02-Jul-20','02-Jul-20','02-Jul-20',
                  '03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20','03-Jul-20'],
    'InvoiceAmount':[79,72,24,68,6,27,25,62,58,36,65,81]
})

Promotion Dates数据帧

+------------+----------------+--------------+
| LocationNo | PromotionStart | PromotionEnd |
+------------+----------------+--------------+
|      A     |    01-Jul-20   |   02-Jul-20  |
+------------+----------------+--------------+
|      B     |    02-Jul-20   |   03-Jul-20  |
+------------+----------------+--------------+
|      C     |    03-Jul-20   |   05-Jul-20  |
+------------+----------------+--------------+
|      D     |    01-Jul-20   |   05-Jul-20  |
+------------+----------------+--------------+
|      E     |    02-Jul-20   |   02-Jul-20  |
+------------+----------------+--------------+
|      F     |    05-Jul-20   |   06-Jul-20  |
+------------+----------------+--------------+

df_2 = pd.DataFrame({
    'LocationNo' : ['A','B','C','D','E','F'],
    'PromotionStart':['01-Jul-20','02-Jul-20','03-Jul-20','01-Jul-20','02-Jul-20','05-Jul-20'],
    'PromotionEnd':['02-Jul-20','03-Jul-20','05-Jul-20','05-Jul-20','02-Jul-20','06-Jul-20',]
})

我的任务是根据两个条件合并两个数据帧&;将值Yes添加到新列Promotion 如果满足这些条件

条件如下

  1. LocationNo应该匹配
  2. {}应该在相应的{}{}和{}之间

基本上,仅当InvoiceDate介于PromotionStartPromotionEnd日期之间时才合并

我想要的输出如下

+------------+-------------+---------------+-----------+
| LocationNo | InvoiceDate | InvoiceAmount | Promotion |
+------------+-------------+---------------+-----------+
|      A     |  01-Jul-20  |       79      |    Yes    |
+------------+-------------+---------------+-----------+
|      B     |  01-Jul-20  |       72      |           |
+------------+-------------+---------------+-----------+
|      C     |  01-Jul-20  |       24      |           |
+------------+-------------+---------------+-----------+
|      A     |  02-Jul-20  |       68      |    Yes    |
+------------+-------------+---------------+-----------+
|      B     |  02-Jul-20  |       6       |    Yes    |
+------------+-------------+---------------+-----------+
|      C     |  02-Jul-20  |       27      |           |
+------------+-------------+---------------+-----------+
|      A     |  03-Jul-20  |       25      |           |
+------------+-------------+---------------+-----------+
|      B     |  03-Jul-20  |       62      |    Yes    |
+------------+-------------+---------------+-----------+
|      C     |  03-Jul-20  |       58      |    Yes    |
+------------+-------------+---------------+-----------+
|      D     |  03-Jul-20  |       36      |    Yes    |
+------------+-------------+---------------+-----------+
|      E     |  03-Jul-20  |       65      |           |
+------------+-------------+---------------+-----------+
|      F     |  03-Jul-20  |       81      |           |
+------------+-------------+---------------+-----------+

我知道如何根据多种条件合并列。但是,在这种情况下,我需要在Date Range中合并

请告诉我该如何解决这个问题


Tags: 数据dfdatainvoice条件julyespd
1条回答
网友
1楼 · 发布于 2024-06-28 10:57:02

让我们尝试merge并筛选:

out = df_1.merge(df_2, on='LocationNo', how='left')

df_1['Promotion'] = np.where(out['InvoiceDate'].between(out['PromotionStart'], out['PromotionEnd']),
                             'Yes', '')

输出:

   LocationNo InvoiceDate  InvoiceAmount Promotion
0           A   01-Jul-20             79       Yes
1           B   01-Jul-20             72          
2           C   01-Jul-20             24          
3           A   02-Jul-20             68       Yes
4           B   02-Jul-20              6       Yes
5           C   02-Jul-20             27          
6           A   03-Jul-20             25          
7           B   03-Jul-20             62       Yes
8           C   03-Jul-20             58       Yes
9           D   03-Jul-20             36       Yes
10          E   03-Jul-20             65          
11          F   03-Jul-20             81          

相关问题 更多 >