python中的无限while循环，用于计算标准偏差

我们试图删除异常值，但得到了一个无限循环

对于一个学校项目，我们（我和一个朋友）决定创建一个基于数据科学的工具。为此，我们开始清理数据库（我不会在这里导入它，因为它太大了（xlsx file，csv file））。我们现在正尝试使用“duration_minutes”列的“标准偏差*3+平均值”规则删除异常值

以下是我们用来计算标准偏差和平均值的代码：

def calculateSD(database, column): column = database[[column]] SD = column.std(axis=None, skipna=None, level=None, ddof=1, numeric_only=None) return SD def calculateMean(database, column): column = database[[column]] mean = column.mean() return mean

我们想做以下几点：

#Now we have to remove the outliers using the code from the SD.py and SDfunction.py files minutes = trainsData['duration_minutes'].tolist() #takes the column duration_minutes and puts it in a list SD = int(calculateSD(trainsData, 'duration_minutes')) #calculates the SD of the column mean = int(calculateMean(trainsData, 'duration_minutes')) SDhigh = mean+3*SD

上面的代码计算起始值。然后我们开始一个while循环来删除异常值。删除异常值后，我们重新计算标准偏差、平均值和SDhigh。这是while循环：

while np.any(i >= SDhigh for i in minutes): #used to be >=, it doesnt matter for the outcome trainsData = trainsData[trainsData['duration_minutes'] < SDhigh] #used to be >=, this caused an infinite loop so I changed it to <=. Then to < minutes = trainsData['duration_minutes'].tolist() SD = int(calculateSD(trainsData, 'duration_minutes')) #calculates the SD of the column mean = int(calculateMean(trainsData, 'duration_minutes')) SDhigh = mean+3*SD print(SDhigh) #to see how the values changed and to confirm it is an infinite loop

输出内容如下所示：

611 652 428 354 322 308 300 296 296 296 296

它继续打印296，经过几个小时的努力，我们得出结论，我们并没有我们希望的那么聪明

TL；DR：我们正在尝试删除所有高于标准偏差*3+平均值的值，直到没有剩余值为止（我们每次都会重新计算，以检查是否仍然存在异常值）。然而，我们得到了一个无限循环

1条回答

网友

1楼 · 发布于 2024-09-27 23:26:12

你让事情变得比必须的更困难。计算标准偏差以去除异常值，然后重新计算等过于复杂（并且在统计上不合理）。你最好使用百分位数而不是标准差

import numpy as np
import pandas as pd

# create data
nums = np.random.normal(50, 8, 200)
df = pd.DataFrame(nums, columns=['duration'])

# set threshold based on percentiles
threshold = df['duration'].quantile(.95) * 2

# now only keep rows that are below the threshold
df = df[df['duration']<threshold]

我们试图删除异常值，但得到了一个无限循环

相关问题更多 >

编程相关推荐

热门问题

热门文章