rpy2 ri2py中的问题

2024-07-06 18:22:28 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图通过Python连接一些R代码,但是将数据转换回pandas对象并不能正确处理NA值。你知道吗

R代码示例:

dummy_call_method1 <- function(argument) {
    col_a <- c("A", "A", "B", "B")
    col_b <- c(1, NA, 11, 12)
    return(data.frame(col_a, col_b))
}

dummy_call_method2 <- function(argument) {
    col_a <- c("A", "A", "B", "B")
    col_b <- c("one", NA, "eleven", "twelve")
    return(data.frame(col_a, col_b))
}

Python代码示例:

import os
import rpy2
from rpy2 import rinterface, robjects
from rpy2.robjects import pandas2ri

def r_source(base_dir, filename):
    r_script = os.path.join(base_dir, filename)
    r_src = rpy2.robjects.r['source']
    r_src(r_script)

def r_call_function(func_name, *args):
    func = rpy2.robjects.r[func_name]
    result = func(*args)
    return result

r_source('~/workspace/', 'test.R')

dummy_results1 = r_call_function("dummy_call_method1", "")
dummy_results2 = r_call_function("dummy_call_method2", "")

print dummy_results1
print rpy2.robjects.pandas2ri.ri2py(dummy_results)
print dummy_results2
print rpy2.robjects.pandas2ri.ri2py(dummy_results2)

我希望对ri2py的两个调用分别用None和NaN替换伪调用中的NA值。然而,尽管后者按预期工作,但出于某种原因,前者正在用“十一”替换NA—我不知道它是在读取未初始化的指针还是什么。你知道吗

以下是输出,注意到意外行为:

  col_a    col_b
1     A        1
2     A       NA
3     B       11
4     B       12

  col_a     col_b
1     A       1.0
2     A       NaN
3     B      11.0
4     B      12.0

  col_a     col_b
1     A       one
2     A      <NA>
3     B    eleven
4     B    twelve

  col_a      col_b
1     A        one
2     A     eleven     #This is incorrect
3     B     eleven
4     B     twelve

Tags: 代码importreturnfunctioncolcallonedummy