“通过散列基数进行二进制编码”的R示例转换为Python cod

my_data <- c("Louise", "Gabriel", "Emma", "Adam", "Alice", "Raphael", "Chloe", "Louis", "Jeanne", "Arthur") matrix( as.integer(intToBits(as.integer(as.factor(my_data)))), ncol = 32, nrow = length(my_data), byrow = TRUE )[, 1:ceiling(log(length(unique(my_data)) + 1)/log(2))]

1条回答

网友

1楼 · 发布于 2024-10-02 22:31:53

Categoricals是一种pandas数据类型，它对应于统计数据中的分类变量：变量只能接受有限的（通常是固定的）数量的可能值（categories；levels in R），您可以使用documentation of pandas，这是文档中的一个小示例：

In [1]: s = pd.Series(["a","b","c","a"], dtype="category")

In [2]: s
Out[2]: 
0    a
1    b
2    c
3    a
dtype: category
Categories (3, object): [a, b, c]

或者正如你在DataFrame中要求的那样：

^{pr2}$

与R系数的差异：

可以观察到R因子函数的以下差异：

R’s levels are named categories
R’s levels are always of type string, while categories in pandas can be of any dtype.
It’s not possible to specify labels at creation time. Use s.cat.rename_categories(new_labels) afterwards.
In contrast to R’s factor function, using categorical data as the sole input to create a new categorical series will not remove unused categories but create a new categorical series which is equal to the passed in one!
R allows for missing values to be included in its levels (pandas’ categories). Pandas does not allow NaN categories, but missing values can still be in the values.

相关问题更多 >

编程相关推荐

热门问题

热门文章

“通过散列基数进行二进制编码”的R示例转换为Python cod

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >