sklearn定制变压器

2024-09-30 16:19:31 发布

您现在位置:Python中文网/ 问答频道 /正文

我试着制作我的管道并添加我的定制变压器,如下所示:

class DataFrameSelector(BaseEstimator, TransformerMixin):
    def __init__(self, attribute_names):
        self.attribute_names = attribute_names
    def fit(self, X, y=None):
        return self
    def transform(self, X):
        return X[list(self.attribute_names)]

以及

class DummyTransform(BaseEstimator, TransformerMixin):

    def __init__(self):
        return None

    def transform(self, X):
        return pd.get_dummies(X).values

    def fit(self, X, y=None):
        return self

但当我这么做的时候: RF=RandomForestClassifier(n\U estimators=100,oob\U score=True,RandomForestClassifier=3)

pipe= Pipeline(steps=[
    ('Selector', DataFrameSelector(attribute_names=('lat','long','type'))),   # selects the second and 4th column      
    ('Encoder', DummyTransform() ) 
    ('clf',RF)
    ])
rforest=pipe.fit(X_train,Y_train)

我有以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-168-108f5c7552a0> in <module>()
      4     ('Selector', DataFrameSelector(attribute_names=('lat','long','type'))),   # selects the second and 4th column
      5     ('Encoder', DummyTransform() )
----> 6     ('clf',RF)
      7     ])
      8 rforest=pipe.fit(X_train,Y_train)

TypeError: 'tuple' object is not callable

为什么???你知道吗

PS:奇怪的是,这一个有效:

RF=RandomForestClassifier(n_estimators=100,oob_score=True,random_state=3)

pipe= Pipeline(steps=[
    ('Selector', DataFrameSelector(attribute_names=('lat','long','type'))),   # selects the second and 4th column      
    ('Encoder', DummyTransform() ) 
    #('clf',DecisionTreeClassifier())
    ])
X=pipe.fit_transform(X_train,Y_train)
RF.fit(X,Y_train)

编辑:RF代表这行代码
RF=RandomForestClassifier(n\U estimators=100,oob\U score=True,RandomForestClassifier=3)


Tags: selfnonereturnnamesdeftransformattributetrain