建立信用风险分析记分卡

scoring的Python项目详细描述


importpandasaspdimportnumpyasnpimportscoringasscfromsklearn.model_selectionimporttrain_test_splitasttsfromsklearn.linear_modelimportLogisticRegressionaslrimportsklearn.metricsasmetrics
df=pd.read_csv('gc.csv')vardict=pd.read_csv('dict.csv')df['Risk']=df['Risk'].apply(lambdax:1ifx=='bad'else0)df=sc.renameCols(df,vardict,False)label,disc,cont=sc.getVarTypes(vardict)# sc.discSummary(df)# ### No row needs to be removed from this example in this stage #### vardict.loc[vardict['new'].isin(['Age','Sex']),'isDel']=1# df,vardict=cl.delFromVardict(df,vardict)
df1=sc.binData(df,vardict)
#########################################
####It's using Chi-Merge algorithm...####
#########################################

Doing continous feature: Age

Doing continous feature: Credit amount
Equal Depth Binning is required, number of bins is: 100

Doing continous feature: Duration

Doing discrete feature: Sex

Doing discrete feature: Job

Doing discrete feature: Housing

Doing discrete feature: Saving accounts

Doing discrete feature: Checking account

Doing discrete feature: Purpose

Finished
bidict=sc.getBiDict(df1,label)
bidict['Credit amount']
<;样式范围>; .dataframe tbody tr th:仅为{ 垂直对齐:中间; }
.dataframe tbody tr th {
    vertical-align: top;
}

.dataframe thead th {
    text-align: right;
}
<;/样式>;
Credit amounttotalgoodbadtotalDistgoodDistbadDistgoodRatebadRatewoeiv
0(-inf, 1282.0]211144670.2110.2230.2060.6820.318-0.0820.001
1(1282.0, 3446.32]4693521170.4690.3900.5030.7510.2490.2540.029
2(3446.32, 3913.26]605550.0600.0170.0790.9170.0831.5510.096
3(3913.26, inf]2601491110.2600.3700.2130.5730.427-0.5530.087
# modified credit amountsc.bivariate(pd.DataFrame({'y':df['y'],'Credit amount':sc.manuallyBin(df,'Credit amount','cont',[-np.inf,1300,3500,4000,np.inf])}),'Credit amount','y')[0]df1['Credit amount']=sc.manuallyBin(df,'Credit amount','cont',[-np.inf,1300,3500,4000,np.inf])
bidict=sc.getBiDict(df1,label)ivtable=sc.ivTable(bidict)
df1,vardict,bidict=sc.featureFilter(df1,vardict,bidict,ivtable)
df=sc.mapWOE(df1,bidict,label)
### Modelling ####################trainx,testx,trainy,testy=tts(df.iloc[:,1:],df[label],test_size=0.3)m=lr(penalty='l1',C=0.9,solver='saga',n_jobs=-1)m.fit(trainx,trainy)pred=m.predict(testx)pred_prob=m.predict_proba(testx)[:,1]# 鏌ョ湅娴嬭瘯缁撴灉cm=metrics.confusion_matrix(testy,pred)print('**Precision is:',(cm[0][0]+cm[1][1])/(sum(cm[0])+sum(cm[1])))print('\n**Confusion matrix is:\n',cm)print('\n**Classification report is:\n',metrics.classification_report(testy,pred))
**Precision is: 0.7233333333333334

**Confusion matrix is:
 [[179  18]
 [ 65  38]]

**Classification report is:
               precision    recall  f1-score   support

           0       0.73      0.91      0.81       197
           1       0.68      0.37      0.48       103

   micro avg       0.72      0.72      0.72       300
   macro avg       0.71      0.64      0.64       300
weighted avg       0.71      0.72      0.70       300
### Evaluation #####################sc.plotROC(testy,pred_prob)sc.plotKS(testy,pred_prob)sc.plotCM(metrics.confusion_matrix(testy,pred),classes=df[label].unique(),title='Confusion matrix, without normalization')

png

png

Confusion matrix, without normalization
[[179  18]
 [ 65  38]]

png

### Scoring ##################scored,basescore=sc.scoring(trainx.reset_index(drop=True),trainy.reset_index(drop=True),'y',m,bidict)

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java Thumbnailator库将图像转换为cmyk   Java反射从目录中的类运行测试   JavaEclipseJDT编译器说方法未定义,但EclipseIDE没有   重构如何重构一行重复的java代码   java Eclipse:使用删除线文本呈现自定义注释   java问题与ArrayList复制数据   java如何在swagger中传递访问令牌?   使用另一个java文件运行java文件时出错   java为什么谷歌云存储生成的上传链接在成功上传后不会失效?   java将我的客户端PC重定向到默认登录页面   java hibernate c3p0配置mysql问题   java和java之间的区别。尼奥。文件文件和java。伊奥。文件   列出java循环并向映射中添加值   java为什么OJ报告这段代码的运行时错误?