pandas cut（）：如何转换nans？或者将输出转换为非类别？

import pandas as pd import numpy as np x=[np.nan,4,6] intervals =[-np.inf,4,np.inf] out_nolabels=pd.cut(x,intervals) out_labels=pd.cut(x,intervals, labels=['<=4','>4']) out_nolabels.add_categories(['missing']) out_labels.add_categories(['missing']) print(out_labels) print(out_nolabels) out_labels=out_labels.fillna('missing') out_nolabels=out_nolabels.fillna('missing')

1条回答

网友

1楼 · 发布于 2024-09-30 20:37:23

正如文档所说，超出界限的数据将被视为Na类别对象，因此不能在分类数据since the new value you are filling is not in that categories中使用fillna

Any NA values will be NA in the result. Out of bounds values will be NA in the resulting Categorical object

您不能使用x.fillna('missing')，因为missing不在x的类别中，但是您可以使用x.fillna('>4')，因为>4在该类别中。在

我们可以利用np.哪里来克服这个问题

x = pd.cut(df['id'],intervals, labels=['<=4','>4'])

np.where(x.isnull(),'missing',x)
array(['<=4', '<=4', '<=4', '<=4', 'missing', 'missing'], dtype=object)

或add_categories的值，即

^{pr2}$

如果您想将nan分组并保留数据类型，一种方法是将其强制转换为str，即如果您有一个数据帧

df = pd.DataFrame({'id':[1,1,1,4,np.nan,np.nan],'value':[4,5,6,7,8,1]})

df.groupby(df.id.astype(str)).mean()

输出：

     id  value
id             
1.0  1.0    5.0
4.0  4.0    7.0
nan  NaN    4.5

相关问题更多 >

编程相关推荐

热门问题

热门文章