共有2个.csv文件-文件1(带有表头订单号、金额、佣金、GST等)和文件2(带有表头hbx号、金额、状态等)
我必须指出,文件2中的“金额”与文件1中的(“金额”–“佣金”–“GST”)相匹配。必须根据文件1中的“订单ID”和文件2中的“hbx ID”执行此操作。 我被要求将状态设置为“匹配”与匹配的项目相对应,否则设置为“不匹配”。 对于文件2中不在文件1中的项目,我应该在文件2中的状态列中输入状态“not present”
我使用嵌套for循环解决了这个问题,我的代码如下所示-
for i in file2.index:
flag = 0
for j in file1.index:
if (file1['Order_ID'][j]).count(file2['hbx_id'][i])>0:
flag = 1
if file1['Amount'][j]-file1['Commission'][j]-file1['GST'][j]==file2['amount'][i]:
file2.at[i,'Status'] = 'matched'
else:
file2.at[i,'Status'] = 'not matched'
if flag == 0:
file2.at[i,'Status'] = 'not present'
这很好,但我的问题是,如果数据集很大(因为for循环迭代),这将不是一个可行的选择。这个方法、pandas函数或任何其他库或其他东西有什么替代品吗
编辑:-预期输出如下所示-
hbx_id tran_date amount Status
0 AMBLEY127928' 01-09-2019 36.00 matched
1 SIPTBT18469' 01-09-2019 65.50 not matched
2 'KITJUJ272150' 01-09-2019 37.28 matched
3 'LEGNOE40184' 01-09-2019 20.00 matched
4 hb-pay_9492291-1' 01-09-2019 313.77 not matched
5 'SF3C145362' 01-09-2019 25.00 matched
6 hb-AMBLEY127931-1' 01-09-2019 75.50 not matched
7 'HEANEM26099' 01-09-2019 19.62 matched
8 hb-COOINT242360-1' 01-09-2019 44.15 not present
9 'MUNDCT3880' 01-09-2019 142.00 matched
10 'KDA178133' 01-09-2019 24.53 matched
11 'KSCA1492699' 01-09-2019 9.81 matched
12 'FOROCH372249' 01-09-2019 14.72 matched
13 'CHAMXN202625' 01-09-2019 29.43 matched
14 'DOSFSX48229' 01-09-2019 49.06 matched
15 'ARAEZG242324' 01-09-2019 25.00 matched
16 hb-SHRWOG391499-1' 01-09-2019 19.56 matched
17 'MALSJU13377' 01-09-2019 16.50 matched
18 hb-OMKXDT420930-1' 01-09-2019 10.00 matched
19 'SCCA528997' 01-09-2019 20.00 not matched
file1.head()
Transaction_ID Order_ID Transaction_Date Updated_Date Transaction_Type Status MID Customer_ID Customer_Nickname Customer_Phone_No. ... Commission_Rate Product_Code GMV_Tier Transaction_Slab Request_Type Refund_Type Refund_Actor Split_Flag Split_MID Split_Id
0 '20190901111212800100168044885072100' hb-SIPJZO32656-1' '2019-09-01 00:54:14' '2019-09-01 00:54:33' 'ACQUIRING' 'SUCCESS' 'EatGoo08603428284199' NaN NaN 9804817853' ... NaN '51051000100000000010' NaN NaN 'SEAMLESS' NaN NaN NaN NaN NaN
1 '20190901111212800110168956985465456' 'SHRWOG391493' '2019-09-01 02:14:43' '2019-09-01 02:14:46' 'ACQUIRING' 'SUCCESS' 'EatGoo08603428284199' NaN NaN NaN ... NaN '51051000100000000001' NaN NaN 'SEAMLESS_3D_FORM' NaN NaN NaN NaN NaN
2 '20190901111212800110168835885639302' 'MIDYEY245461' '2019-09-01 00:23:36' '2019-09-01 00:23:39' 'ACQUIRING' 'SUCCESS' 'EatGoo08603428284199' NaN NaN NaN ... NaN '51051000100000000001' NaN NaN 'SEAMLESS_3D_FORM' NaN NaN NaN NaN NaN
3 '20190901111212800110168061185352899' 'KITJUJ272194' '2019-09-01 03:40:00' '2019-09-01 03:40:03' 'ACQUIRING' 'SUCCESS' 'EatGoo08603428284199' NaN NaN NaN ... NaN '51051000100000000001' NaN NaN 'SEAMLESS_3D_FORM' NaN NaN NaN NaN NaN
4 '20190901111212800110168140285515665' 'EXLRYB3995' '2019-09-01 00:08:21' '2019-09-01 00:08:21' 'ACQUIRING' 'SUCCESS' 'EatGoo08603428284199' '216810000026794940140' NaN '9515053146' ... NaN '51051000100000000001' NaN NaN 'AUTO_DEBIT' NaN NaN NaN NaN NaN
file2.head()
hbx_id tran_date amount
0 AMBLEY127928' 01-09-2019 36.00
1 SIPTBT18469' 01-09-2019 65.50
2 'KITJUJ272150' 01-09-2019 37.28
3 'LEGNOE40184' 01-09-2019 20.00
4 hb-pay_9492291-1' 01-09-2019 313.77
首先,我会简化您的数据,以便为file1创建一个新的数据框,它只是您正在使用的列
有了它,您就可以在第一个
if
语句中进行计算,并用这些值创建一个新列假设
df1['Order_ID']
是唯一的值,其中一些在file2['hbx_id']
中,您可以合并这两列上的两个数据帧,保持所有内容,这样您就有一个数据帧,所有数据都对齐您可以简单地创建一个新列
df1['Status']
,其中如果df1['amount_check'] == df1['amount']
,那么您可以给出值'matched'
等等相关问题 更多 >
编程相关推荐