Pandas：列中每个值的nan百分比

id type ... priority Client 0 56 113 Incident ... Low client1 1 56 267 Demande ... High client1 2 56 294 Incident ... Nan NaN 3 56 197 Demande ... Low client3 4 56 143 Demande ... Nan client4

df.notna().sum()/len(agg_global)*100 Out[29]: id 97.053453 type 76.415869 priority 82.626625 client 84.596443

Client1 Client2 Client3 NaN id 100.000000 100.000000 100.000000 66.990424 type 76.415869 66.990424 76.415869 43.761970 status 100.000000 100.000000 66.990424 76.415869 category 66.990424 43.761970 76.415869 43.761970 entity 43.761970 100.000000 76.415869 76.415869 source_demande 84.596443 100.000000 76.415869 43.761970

id type ... priority Client client ... True 97.053453 76.415869 ... 29.98632 29.98632

2条回答

网友

1楼 · 编辑于 2024-10-06 13:08:11

您可以删除列Client，因为它没有测试缺失值的百分比，通过^{}测试它们，通过Client聚合平均值以替换NaN避免丢失它们，最后通过^{}进行转置：

print (df)
       id      type priority   Client
0     NaN  Incident      Low  client1
1     NaN       NaN     High  client1
2  56 294  Incident      Nan      NaN
3  56 197       NaN      Low  client3
4     NaN   Demande      NaN  client4


df = (df.drop('Client', 1)
        .isna()
        .groupby(df['Client'].fillna('NaN'))
        .mean()
        .rename_axis(None)
        .T)
print (df)
          NaN  client1  client3  client4
id        0.0      1.0      0.0      1.0
type      0.0      0.5      1.0      0.0
priority  0.0      0.0      0.0      1.0

网友

2楼 · 编辑于 2024-10-06 13:08:11

在我看来，使用暴力是可能的。我会尝试使用isna函数和求和来估计每行或每列中的NaN数，然后我会尝试估计百分比

相关问题更多 >

编程相关推荐

热门问题

热门文章