基本上,我有一些有支付习惯的客户,并根据他们在df2数据框中表达的特征统计支付概率。然后我得到一份新的客户名单,并试图估算他们的总付款额
import pandas as pd
import numpy as np
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo'],
'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card'],
'Colateral':['Yes','Yes','Yes','No','Yes','No','No','Yes','Yes','Yes','Yes','Yes','Yes','Yes','No','Yes','No','Yes','Yes','No','No','No'],
'Client Number':[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22],
'DebtPaid':[0.8,0.1,0.5,0.30,0,0.2,0.4,1,0.60,1,0.5,0.2,0,0.3,0,0,0.2,0,0.1,0.70,0.5,0.1]}
df = pd.DataFrame(data=d)
d = {'City': ['Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo','Tokyo','Lisbon','Tokyo','Tokyo','Lisbon','Lisbon','Lisbon','Tokyo','Lisbon','Tokyo','Tokyo'],
'Card': ['Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Visa','Visa','Master Card','Master Card','Visa','Master Card','Visa','Visa','Master Card','Visa','Master Card','Bitcoin'],
'Colateral':['Yes','No','Yes','No','No','No','No','Yes','Yes','No','Yes','Yes','No','Yes','No','No','No','Yes','Yes','No','No','No','Yes'],
'Client Number':[23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45],
'Total Debt':[100,240,200,1000,500,20,345,100,600,40,50,20,1000,300,1000,600,200,200,150,700,50,120,145]}
df3 = pd.DataFrame(data=d)
df2=df.groupby(['City','Card','Colateral'])['DebtPaid'].\
value_counts(bins=[-0.001,0,0.25,0.5,0.75,1,1.001,2],normalize=True)
这是df2的结果
df_out = df2.rename('Prob').reset_index().merge(df3, on=['City', 'Card', 'Colateral'])
df_out['lower'] = [x.left for x in df_out['DebtPaid']]
df_out['upper'] = [x.right for x in df_out['DebtPaid']]
df_out['l_partial'] = df_out[['lower', 'Prob', 'Total Debt']].prod(axis=1)
df_out['u_partial'] = df_out[['upper', 'Prob', 'Total Debt']].prod(axis=1)
final = df_out.groupby('Client Number')[['l_partial', 'u_partial']]\
.agg(lower_price=('l_partial', 'sum'),
upper_price=('u_partial', 'sum')).clip(0,np.inf)
如您所见,缺少客户(24、27、35、38、45)。原因是没有东京签证“是”特征客户的数据。发生这种情况时,我希望“在后一种情况下”,应用东京Visa统计数据,或仅应用东京,以防客户有其他卡方法
如何解决这个问题有什么想法吗
分组时的列表格式。使用所有组合的多索引更新它,用
fill_value'. Places without a value are replaced with
fill_value'替换空位置。结果数据帧返回为长格式,带有explode()
。在此状态下,您应该能够按预期聚合数据相关问题 更多 >
编程相关推荐