通过另一个指标数据框评估数据框

2024-09-29 21:44:19 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个源数据帧输入\u df


        PatientID   KPI_Key1          KPI_Key2    KPI_Key3
    0   1           (C602+C603)       C601           NaN            
    1   2           (C605+C606)       C602           NaN            
    2   3           75                L239+C602      NaN            
    3   4           (32*(C603+234))   75             NaN            
    4   5           L239              NaN            C601

我有另一个指标数据框指标_df


               99   75  C604    C602    C601    C603    C605    C606    44  L239    32
    PatientID                                           
    1          1    0    1       0       1       0       0       0      1    0       1
    2          0    0    0       0       0       0       1       1      0    0       0
    3          1    1    1       1       0       1       1       1      1    1       1
    4          0    0    0       0       0       1       0       1      0    1       0
    5          1    0    1       1       1       1       0       1      1    1       1
    6          0    1    0       0       0       0       0       0      0    0       0
    7          1    1    1       1       1       1       1       1      1    1       1
    8          0    0    0       0       0       0       0       0      0    0       0

  

现在,我需要生成这样的输出output\u df


        PatientID   KPI_Key1    KPI_Key2    KPI_Key3
    0       1          0           1          0
    1       2          1           0          0
    2       3          1           1          0
    3       4          0           0          0
    4       5          1           0          1

输出_df是通过对照指标_df“评估”输入_df中的输入公式获得的。
表示或条件的+表示 1 + 1 = 1 ; 1 + 0 = 1 ; 0+0=0
表示和条件。1 * 1 = 1 ; 0 * 0 = 0 ; 1*0=0

资料来源:


input_df = pd.DataFrame({'PatientID': [1,2,3,4,5], 'KPI_Key1': ['(C602+C603)','(C605+C606)','75','(32*(C603+234))','L239'] , 'KPI_Key2' : ['C601','C602','L239+C602','75',''] , 'KPI_Key3' : ['','','','','C601']})


indicator_df = pd.DataFrame({'PatientID': [1,2,3,4,5,6,7,8],'99' : ['1','0','1','0','1','0','1','0'],'75' : ['0','0','1','0','0','1','1','0'],'C604' : ['1','0','1','0','1','0','1','0'],'C602' : ['0','0','1','0','1','0','1','0'],'C601' : ['1','0','0','0','1','0','1','0'],'C603' : ['0','0','1','1','1','0','1','0'],'C605' : ['0','1','1','0','0','0','1','0'],'C606' : ['0','1','1','1','1','0','1','0'],'44' : ['1','0','1','0','1','0','1','0'],'L239' : ['0','0','1','1','1','0','1','0'], '32' : ['1','0','1','0','1','0','1','0'],}).set_index('PatientID')


output_df = pd.DataFrame({'PatientID': [1,2,3,4,5], 'KPI_Key1': ['0','1','1','0','1'] , 'KPI_Key2' : ['1','0','1','0','0'] , 'KPI_Key3' : ['0','0','0','0','1']})


Tags: dfnan指标pdkpikey2key1key3
1条回答
网友
1楼 · 发布于 2024-09-29 21:44:19

我终于解决了这个问题:

final_out_df = pd.DataFrame()
for i in range(len(input_df)):
    for j in ['KPI_Key1','KPI_Key2','KPI_Key3']:
      exp = input_df[j].iloc[i]
      #checking for NaN values
      if exp == exp:
        temp_out_df=indicator_df.eval(re.sub(r'(\w+)', r'`\1`', exp)).reset_index(name=j)
        out_df['KPI_Key'] =  input_df['KPI_Id'].iloc[i]
        out_df = out_df.merge(temp_out_df, on='PateintID', how='left')
    final_out_df= final_out_df.append(out_df)
    out_df = pd.DataFrame(index=indicator_df.index)
    out_df.reset_index(level=0, inplace=True)
final_out_df.index = range(len(final_out_df))
#filling NAN values to 0 and converting everything to int
final_out_df.fillna(0,inplace=True)
final_out_df[["KPI_Key1", "KPI_Key2", "KPI_Key3"]] = final_out_df[["KPI_Key1", "KPI_Key2", "KPI_Key3"]].astype(int)
#columns >1 = 1 
final_out_df.loc[final_out_df['KPI_Key1'] >= 1, 'KPI_Key1'] = 1 
final_out_df.loc[final_out_df['KPI_Key2'] >= 1, 'KPI_Key2'] = 1 
final_out_df.loc[final_out_df['KPI_Key3'] >= 1, 'KPI_Key3'] = 1 

相关问题 更多 >

    热门问题