匹配两个表(明细表到小计表),同时在“明细”选项卡中标识不匹配的项

2024-09-28 01:28:17 发布

您现在位置:Python中文网/ 问答频道 /正文

我必须将摘要/小计表与明细表匹配,同时在明细表中标识不匹配的项

由于明细记录和小计记录之间的时间差存在显著差异(+/-),因此我使用pandas merge\u asof()应用的方法不够好,即使它基于给定的timedelta值进行匹配,它也不会检查两个表之间的金额是否相等。 有没有一种方法可以编码,考虑到小计表中的每个值,从明细表计算小计,如果匹配,则转到小计表中的下一项,并从明细表中的下一项开始小计。 如果有人能在这个问题上提供帮助,我将不胜感激

import pandas as pd
import datetime as dt

subtotal = pd.DataFrame(data = {'Date':['21/09/2018  17:45:27','21/09/2018  19:10:24','21/09/2018  21:42:03'],
                             'Amount':[2000,3000,6000],
                             'Ref':[1,2,3]},columns=['Date', 'Amount', 'Ref'])

detail = pd.DataFrame(data = {'Date':['21/09/2018  17:37:05','21/09/2018  17:56:22','21/09/2018  17:56:53','21/09/2018  18:54:56','21/09/2018 19:12:56','21/09/2018 19:15:30 ','21/09/2018 21:35:59','21/09/2018  21:36:20','21/09/2018 21:43:32 '],
                             'Amount':[1000,500,500,1000,3000,12000,1000,2000,3000]},
                                columns=['Date', 'Amount'])

subtotal['Date'] = pd.to_datetime(subtotal['Date'])
detail['Date'] = pd.to_datetime(detail['Date'])

# Code i tried with pandas .merge_asof()

subtotal_sorted = subtotal.sort_values(by='Date')
detail_sorted = detail.sort_values(by='Date') 

subtotal_sorted.index = subtotal_sorted['Date']
detail_sorted.index = detail_sorted['Date']

tol = pd.Timedelta('15 minute')
result = pd.merge_asof(left=detail_sorted,right=subtotal_sorted, right_index=True,left_index=True,direction='nearest',tolerance=tol)

“我希望得到一个与此类似的结果表。”但仅使用pandas mergeasof()与小计值不匹配。所以我必须研究另一种方法

Ref DateTime             Value       Result     Ref_1   DateTime_1          Value_1
1   09/21/2018 17:37     1,000.00    Index1     1       09/21/2018 17:45    2000
2   09/21/2018 17:56     500.00      Index1     1       09/21/2018 17:45    2000
3   09/21/2018 17:56     500.00      Index1     1       09/21/2018 17:45    2000
4   09/21/2018 18:54     1,000.00    Index2     2       09/21/2018 19:10    3000
5   09/21/2018 19:12     2,000.00    Index2     2       09/21/2018 19:10    3000
6   09/21/2018 19:15     12,000.00   No Match           
7   09/21/2018 21:35     1,000.00    Index3     3       09/21/2018 21:42    6000
8   09/21/2018 21:36     2,000.00    Index3     3       09/21/2018 21:42    6000
9   09/21/2018 21:43     3,000.00    Index3     3       09/21/2018 21:42    6000"

Tags: 方法refpandasdatetimedateindexmergeamount
1条回答
网友
1楼 · 发布于 2024-09-28 01:28:17

我怀疑这是其中的一个问题,它是不容易得到一个完全匹配的一切。不管怎样,我试了一下

首先让我们定义一个执行合并的函数。这与您已经做的几乎相同,只是为所有匹配的小计添加了Amount_detail求和,只保留求和匹配的行

def merge(subtotal, detail, tol):

    subtotal.sort_values(by='Date', inplace=True)
    detail.sort_values(by='Date', inplace=True) 

    # We merge using merge_asof as before
    result = pd.merge_asof(left=detail,right=subtotal, on='Date',
                           direction='nearest',tolerance=tol)
    # We total amount_detail over the matching ref
    result['sum_amount_detail'] = result.groupby(['Ref'])['Amount_detail'].transform('sum')

    # If sum_amount_detail == Amount_subtotal we have a match!!
    match = result[result['sum_amount_detail'] == result['Amount_subtotal']]
    # Otherwise... no
    no_match = result[result['sum_amount_detail'] != result['Amount_subtotal']]

    detail_match = match[['Date', 'Amount_detail', 'Ref']].copy()
    detail_no_match = no_match[['Date', 'Amount_detail']].copy()
    subtotal_match = subtotal[subtotal['Ref'].isin(detail_match['Ref'].unique())].copy()
    subtotal_no_match = subtotal[~subtotal['Ref'].isin(detail_match['Ref'].unique())].copy()

    return detail_match, subtotal_match, detail_no_match, subtotal_no_match

现在将此函数与原始条件一起使用(15分钟公差)

import pandas as pd

subtotal = pd.DataFrame(data = {'Date':['21/09/2018  17:45:27','21/09/2018  19:10:24','21/09/2018  21:42:03'],
                             'Amount_subtotal':[2000,3000,6000],
                             'Ref':[1,2,3]},columns=['Date', 'Amount_subtotal', 'Ref'])

detail = pd.DataFrame(data = {'Date':['21/09/2018  17:37:05','21/09/2018  17:56:22','21/09/2018  17:56:53','21/09/2018  18:54:56','21/09/2018 19:12:56','21/09/2018 19:15:30 ','21/09/2018 21:35:59','21/09/2018  21:36:20','21/09/2018 21:43:32 '],
                             'Amount_detail':[1000,500,500,1000,3000,12000,1000,2000,3000]},
                                columns=['Date', 'Amount_detail'])
subtotal['Date'] = pd.to_datetime(subtotal['Date'])
detail['Date'] = pd.to_datetime(detail['Date'])

tol = pd.Timedelta('15 minute')
detail_match, subtotal_match, detail_no_match, subtotal_no_match = merge(subtotal, detail, tol)

除了有一个明显的缺陷外,这个方法很有效,那就是不包括发生在2018-09-21 19:10:24(3000)的小计。这是因为它还与另一个值匹配,所以总超调量超出小计

一个解决方法是在循环中进行合并,在循环中我们不断增加容差…这样我们首先得到最接近的匹配。。。然后火柴越来越远。虽然不漂亮,但很管用

tolerances = [pd.Timedelta('5 minute'), pd.Timedelta('10 minute'), pd.Timedelta('15 minute')]

subtotal_no_match = subtotal.copy()
detail_no_match = detail.copy()

detail_list = []
subtotal_list = []

for tol in tolerances:

    detail_match, subtotal_match, detail_no_match, subtotal_no_match = merge(subtotal_no_match, detail_no_match, tol)
    if len(detail_match) > 0:
        detail_list.append(detail_match)
        subtotal_list.append(subtotal_match)
    if len(subtotal_no_match)==0:
        # We have matched everything in subtotal
        break

detail_final = pd.concat(detail_list)
subtotal_final = pd.concat(subtotal_list)
detail_final
Out[5]: 
                 Date  Amount_detail  Ref
4 2018-09-21 19:12:56           3000  2.0
5 2018-09-21 21:35:59           1000  3.0
6 2018-09-21 21:36:20           2000  3.0
7 2018-09-21 21:43:32           3000  3.0
0 2018-09-21 17:37:05           1000  1.0
1 2018-09-21 17:56:22            500  1.0
2 2018-09-21 17:56:53            500  1.0


subtotal_final
Out[6]: 
                 Date  Amount_subtotal  Ref
1 2018-09-21 19:10:24             3000    2
2 2018-09-21 21:42:03             6000    3
0 2018-09-21 17:45:27             2000    1

相关问题 更多 >

    热门问题