在左联接中使用SQLlike join与filter(Between)合并表

2024-05-19 10:09:31 发布

您现在位置:Python中文网/ 问答频道 /正文

所以我有这两个表,在这两个表中,我想做一个left join并过滤来自df1date列与df2列之间的行。在

注意row 6,它没有{},它最终会导致问题。在

df1

  Company Resource ClockInDate
0       A     ResA  2019-02-09
1       A     ResB  2019-02-09
2       A     ResC  2019-02-09
3       B     ResD  2019-02-09
4       B     ResE  2019-02-09
5       B     ResF  2019-02-09
6       B     ResG         NaT

df2

^{pr2}$

我想我可以用pandas中的left merge来完成这个任务,然后再应用一个过滤器。
但它给出了不同的输出。在

因此,在SQL中,您可以像这样在ON子句中包含此筛选器,但这与在WHERE子句的联接之后包含此筛选器不同:

       SELECT t1.company,
              t1.resource,
              t2.company,
              t2.resource,
              t1.ClockInDate,
              t2.EffectiveFrom,
              t2.EffectiveTo
       FROM table1 t1
       LEFT JOIN table2 t2 ON t1.resource = t2.resource
                            AND t1.company = t2.company
                            AND t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTo

注意部分:AND t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTo
注意:在SQL代码中df1t1,而{}是{}

SQL输出(这是我的预期输出):

    t1.Company  t1.Resource t1.ClockInDate  t2.EffectiveFrom    t2.EffectiveTo
0   A           ResA        2019-02-09      2019-01-01          2099-12-31
1   A           ResB        2019-02-09      2019-01-01          2099-12-31
2   A           ResC        NaT             NaT                 NaT
3   B           ResD        NaT             NaT                 NaT
4   B           ResE        2019-02-09      2019-01-01          2099-12-31
5   B           ResF        2019-02-09      2019-01-01          2099-12-31
6   B           ResG        NaT             NaT                 NaT

这是我在Python中的代码:

Python输出

df_merge = pd.merge(df1, df2, on=['Company', 'Resource'], how='left')
df_final = df_merge[df_merge.ClockInDate.between(df_merge.EffectiveFrom, df_merge.EffectiveTo) | df_merge.EffectiveFrom.isnull()]

#Output:

    Company Resource    ClockInDate EffectiveFrom   EffectiveTo
1   A       ResA        2019-02-09  2019-01-01      2099-12-31
3   A       ResB        2019-02-09  2019-01-01      2099-12-31
4   A       ResC        2019-02-09  NaT             NaT
5   B       ResD        2019-02-09  NaT             NaT
7   B       ResE        2019-02-09  2019-01-01      2099-12-31
9   B       ResF        2019-02-09  2019-01-01      2099-12-31

因此请注意,带有resource ResG的最后一行没有包含在Python输出中。在

可复制并粘贴可复制的代码DataFrames

df1 = pd.DataFrame({'Company':['A', 'A', 'A', 'B', 'B', 'B', 'B'],
                    'Resource':['ResA', 'ResB','ResC', 'ResD', 'ResE', 'ResF', 'ResG'],
                    'ClockInDate':['2019-02-09', '2019-02-09', '2019-02-09', '2019-02-09', '2019-02-09', '2019-02-09', '']})

df1['ClockInDate'] = pd.to_datetime(df1.ClockInDate)

df2 = pd.DataFrame({'Company':['A','A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B'],
                    'Resource':['ResA', 'ResA', 'ResB', 'ResB', 'ResE', 'ResE', 'ResF', 'ResF', 'ResG', 'ResG'],
                    'EffectiveFrom':['2018-01-01', '2019-01-01', '2018-01-01', '2019-01-01', '2018-01-01', '2019-01-01', '2018-01-01', '2019-01-01', '2018-01-01', '2019-01-01'],
                    'EffectiveTo':['2018-12-31', '2099-12-31', '2018-12-31', '2099-12-31', '2018-12-31', '2099-12-31', '2018-12-31', '2099-12-31', '2018-12-31', '2099-12-31']})

df2['EffectiveFrom'] = pd.to_datetime(df2.EffectiveFrom)
df2['EffectiveTo'] = pd.to_datetime(df2.EffectiveTo)

Tags: dfmergenatresourcecompanydf1t1df2
1条回答
网友
1楼 · 发布于 2024-05-19 10:09:31

所以在完成这个项目之后,我获得了更多的洞察力。我找到了一个解决方案,但希望能找到一个。但这是可行的:我们可以将原始数据帧中具有ClockIndate.isnull的行合并:

df_merge = pd.merge(df1, df2, on=['Company', 'Resource'], how='left')

df_filter = df_merge[df_merge.ClockInDate.between(df_merge.EffectiveFrom, df_merge.EffectiveTo) | df_merge.EffectiveFrom.isnull()]

df_final = pd.concat([df_filter, df1[df1.ClockInDate.isnull()]], sort=True)

print(df_final)
  ClockInDate Company EffectiveFrom EffectiveTo Resource
1  2019-02-09       A    2019-01-01  2099-12-31     ResA
3  2019-02-09       A    2019-01-01  2099-12-31     ResB
4  2019-02-09       A           NaT         NaT     ResC
5  2019-02-09       B           NaT         NaT     ResD
7  2019-02-09       B    2019-01-01  2099-12-31     ResE
9  2019-02-09       B    2019-01-01  2099-12-31     ResF
6         NaT       B           NaT         NaT     ResG
网友
2楼 · 发布于 2024-05-19 10:09:31

sql与where等效:

SELECT t1.company,
        t1.resource,
        t2.company,
        t2.resource,
        t1.ClockInDate,
        t2.EffectiveFrom,
        t2.EffectiveTo
FROM table1 t1
LEFT JOIN table2 t2 ON t1.resource = t2.resource
                    AND t1.company = t2.company
WHERE t1.ClockInDate IS NULL  no ClockInDate to check
    OR t2.company IS NULL AND t2.resource IS NULL  not rows in t2 for t1
    OR t1.ClockInDate BETWEEN t2.EffectiveFrom AND t2.EffectiveTo  ClockInDate exists, rows in t2 exist, we can now check ClockInDate to be between t2.EffectiveFrom AND t2.EffectiveTo

这将转化为python:

^{pr2}$

相关问题 更多 >

    热门问题