函数标识数据帧中哪些行超过了列的“平均值”

import os import pandas as pd import numpy as np relativePath=os.getcwd() dataFilePath=relativePath+"/Resources/crimeData.csv" data = pd.read_csv(dataFilePath) df = pd.DataFrame(data) df.mean(axis=0) df.style.apply(lambda x: np.where(x > df.mean(), 'background-color: green', ''), axis=1) df

Address ARSON ASSAULT BAD CHECKS BRIBERY \ 0 OAK ST / LAGUNA ST 0 0 0 0 1 VANNESS AV / GREENWICH ST 0 1 0 0 2 1500 Block of LOMBARD ST 0 8 0 0 3 100 Block of BRODERICK ST 0 2 1 0 4 0 Block of TEDDY AV 0 9 0 0

1条回答

网友

1楼 · 发布于 2024-09-28 20:45:43

将列地址放入索引中。在

df = df.set_index('Address')

您应该能够使用pandaswhere方法使所有值小于平均值null，然后使用dropna删除行

^{pr2}$

下面是一个例子，我创建了一个10列，2行的数据帧，包含0到1之间的随机数。只保留两列均大于列平均值的行。在

np.random.seed(1)
df = pd.DataFrame(np.random.rand(10,2))
df.where(df > df.mean()).dropna()

          0         1
0  0.417022  0.720324
4  0.396767  0.538817
5  0.419195  0.685220
8  0.417305  0.558690

为了进一步检查原始数据帧，我们可以突出显示大于平均值的单元格。有两个绿色单元格的行就是我们想要的。在

^{4}$

您还应该能够：

df[(df > df.mean()).all(1)]

相关问题更多 >

编程相关推荐

热门问题

热门文章