使用带IsolationForest的GridSearchCV查找异常值

2024-06-25 22:32:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我想用IsolationForest来查找异常值。我想用GridSearchCV找到模型的最佳参数。问题是我总是得到同样的错误:

TypeError: If no scoring is specified, the estimator passed should have a 'score' method. The estimator IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
                max_features=1.0, max_samples='auto', n_estimators=100,
                n_jobs=None, random_state=None, verbose=0, warm_start=False) does not.

这似乎是个问题,因为IsolationForest没有{}方法。 有办法解决这个问题吗? 还有没有办法给隔离林打分? 这是我的代码:

^{pr2}$

Tags: no模型nonefalse参数ifis错误
2条回答

您需要创建自己的评分函数,因为IsolationForest没有内置score方法。相反,您可以使用score_samples中提供的score_samples函数(可以看作是score的代理)并创建自己的记分器,如here所述,并将其传递给GridSearchCV。我修改了您的代码以执行以下操作:

import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest
from sklearn.model_selection import GridSearchCV

df = pd.DataFrame({'first': [-112,0,1,28,5,6,3,5,4,2,7,5,1,3,2,2,5,2,42,84,13,43,13],
                   'second': [42,1,2,85,2,4,6,8,3,5,7,3,64,1,4,1,2,4,13,1,0,40,9],
                   'third': [3,4,7,74,3,8,2,4,7,1,53,6,5,5,59,0,5,12,65,4,3,4,11],
                   'result': [5,2,3,0.04,3,4,3,125,6,6,0.8,9,1,4,59,12,1,4,0,8,5,4,1]})

x = df.iloc[:,:-1]

tuned = {'n_estimators':[70,80], 'max_samples':['auto'],
     'contamination':['legacy'], 'max_features':[1],
     'bootstrap':[True], 'n_jobs':[None,1,2], 'behaviour':['old'],
     'random_state':[None,1,], 'verbose':[0,1,2], 'warm_start':[True]}  

def scorer_f(estimator, X):   #your own scorer
      return np.mean(estimator.score_samples(X))

#or you could use a lambda aexpression as shown below
#scorer = lambda est, data: np.mean(est.score_samples(data)) 

isolation_forest = GridSearchCV(IsolationForest(), tuned, scoring=scorer_f)
model = isolation_forest.fit(x)

SAMPLE OUTPUT

^{pr2}$

希望这有帮助!在

我相信评分是指GridSearchCV对象,而不是IsolationForest。在

如果它是“None”(默认值),它将尝试使用estimators评分,正如您所说的,它不存在。尝试在GridSearchCV对象中使用一个适合您的问题的available scoring metrics

相关问题 更多 >