使用R中的记分卡时出错(“数据”必须是向量类型,为“NULL”)

2024-09-28 23:28:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用python的库rpy2R包应用到pandas dataframe

我想将包中的记分卡中的函数应用于pandas datframe,但当我遇到错误时,我不知道为什么

这是我的代码:

# python
import pandas as pd
import numpy as np
import rpy2
from rpy2.robjects.packages import importr
from rpy2.robjects import pandas2ri
from rpy2.robjects.vectors import DataFrame

# R
base = importr('base')
score = importr("scorecard")

# Create pandas df
df = pd.DataFrame( np.random.randn(5,4), # 5 rows, 2 columns
               columns = ["A","B","C","D"], # name of columns
               index = ["Max", "Nathy", "Tom", "Joe", "Kathy"] )
df["C"] = [0,0,1,0,1] # "BGI"

pandas2ri.activate()
# Convert pandas to r
df_r = pandas2ri.py2ri(df)
df_r = base.as_data_frame(df_r)
print(type(df_r))
pandas2ri.deactivate()

bins = score.woebin(df_r, 
                    y = "C",
                    x = base.c("A","B") )

我在las命令中得到以下错误

^{pr2}$

Tags: columnsfromimportdataframepandasdfbaseas
1条回答
网友
1楼 · 发布于 2024-09-28 23:28:04

这里有一个带有pyper的选项

import pandas as pd
import numpy as np
from pyper import *

df = pd.DataFrame( np.random.randn(5,4), # 5 rows, 2 columns
               columns = ["A","B","C","D"], # name of columns
               index = ["Max", "Nathy", "Tom", "Joe", "Kathy"] )
df["C"] = [0,0,1,0,1]


r=R(use_pandas=True)

r.assign("df_r", df)  
r("library(scorecard)")

r('bins <- woebin(df_r, y = "C", c("A", "B"))')

binsN = r.get('bins')

-检查输出

^{pr2}$

这也可以通过R使用reticulate获取{}对象来实现。创建了一个python脚本('pytmp.py公司')

#pytmp.py

import pandas as pd
import numpy as np


df = pd.DataFrame( np.random.randn(5,4), # 5 rows, 2 columns
               columns = ["A","B","C","D"], # name of columns
               index = ["Max", "Nathy", "Tom", "Joe", "Kathy"] )
df["C"] = [0,0,1,0,1] # "BGI"

df

-在R中调用它

library(reticulate)
library(scorecard)
use_python("/usr/local/bin/python")
use_virtualenv("~/r-reticulate")

source_python("pytmp.py")
bins <- woebin(df, y = "C", x = c("A","B") )
bins
#$A
#   variable                bin count count_distr good bad   badprob        woe     bin_iv  total_iv      breaks is_special_values
#1:        A [-Inf,0.895928754)     3         0.6    2   1 0.3333333 -0.2876821 0.04794701 0.1155245 0.895928754             FALSE
#2:        A [0.895928754, Inf)     2         0.4    1   1 0.5000000  0.4054651 0.06757752 0.1155245         Inf             FALSE

#$B
#   variable                 bin count count_distr good bad   badprob        woe     bin_iv  total_iv       breaks is_special_values
#1:        B [-Inf,0.2356073663)     3         0.6    2   1 0.3333333 -0.2876821 0.04794701 0.1155245 0.2356073663             FALSE
#2:        B [0.2356073663, Inf)     2         0.4    1   1 0.5000000  0.4054651 0.06757752 0.1155245          Inf             FALSE

注意:我们没有设置种子,所以每次运行的值都会不同

相关问题 更多 >