sklearn转换管道和功能联合问题的回答

sklearn转换管道和功能联合

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我在尝试运行以下代码时遇到问题。这是住房价格的机器学习问题。在 <pre><code>from sklearn.pipeline import FeatureUnion from sklearn.preprocessing import StandardScaler from sklearn.pipeline import Pipeline from sklearn.base import BaseEstimator,TransformerMixin num_attributes=list(housing_num) cat_attributes=['ocean_proximity'] rooms_ix, bedrooms_ix, population_ix, household_ix = 3, 4, 5, 6 class DataFrameSelector(BaseEstimator,TransformerMixin): def __init__(self,attribute_names): self.attribute_names=attribute_names def fit(self,X,y=None): return self def transform(self,X,y=None): return X[self.attribute_names].values class CombinedAttributesAdder(BaseEstimator, TransformerMixin): def __init__(self, add_bedrooms_per_room = True): # no *args or **kargs self.add_bedrooms_per_room = add_bedrooms_per_room def fit(self, X,y=None): return self # nothing else to do def transform(self, X,y=None): rooms_per_household = X[:, rooms_ix] / X[:, household_ix] population_per_household = X[:, population_ix] / X[:, household_ix] if self.add_bedrooms_per_room: bedrooms_per_room = X[:, bedrooms_ix] / X[:, rooms_ix] return np.c_[X, rooms_per_household, population_per_household, bedrooms_per_room] else: return np.c_[X, rooms_per_household, population_per_household] num_pipeline=Pipeline([ ('selector',DataFrameSelector(num_attributes)), ('imputer',Imputer(strategy="median")), ('attribs_adder',CombinedAttributesAdder()), ('std_scalar',StandardScaler()), ]) cat_pipeline=Pipeline([ ('selector',DataFrameSelector(cat_attributes)), ('label_binarizer',LabelBinarizer()), ]) full_pipeline=FeatureUnion(transformer_list=[ ("num_pipeline",num_pipeline), ("cat_pipeline",cat_pipeline), ]) </code></pre> 当我试图运行时出现错误： ^{pr2}$ 误差如下： <pre><code>--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-141-acd0fd68117b> in <module>() ----> 1 housing_prepared = full_pipeline.fit_transform(housing) /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/pipeline.pyc in fit_transform(self, X, y, **fit_params) 744 delayed(_fit_transform_one)(trans, weight, X, y, 745 **fit_params) --> 746 for name, trans, weight in self._iter()) 747 748 if not result: /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/externals/<a href="https://www.cnpython.com/pypi/joblib" class="inner-link">joblib</a>/parallel.pyc in __call__(self, iterable) 777 # was dispatched. In particular this covers the edge 778 # case of Parallel used with an exhausted iterator. --> 779 while self.dispatch_one_batch(iterator): 780 self._iterating = True 781 else: /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in dispatch_one_batch(self, iterator) 623 return False 624 else: --> 625 self._dispatch(tasks) 626 return True 627 /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in _dispatch(self, batch) 586 dispatch_timestamp = time.time() 587 cb = BatchCompletionCallBack(dispatch_timestamp, len(batch), self) --> 588 job = self._backend.apply_async(batch, callback=cb) 589 self._jobs.<a href="https://www.cnpython.com/list/append" class="inner-link">append</a>(job) 590 /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in apply_async(self, func, callback) 109 def apply_async(self, func, callback=None): 110 """Schedule a func to be run""" --> 111 result = ImmediateResult(func) 112 if callback: 113 callback(result) /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/externals/joblib/_parallel_backends.pyc in __init__(self, batch) 330 # Don't delay the application, to avoid keeping the input 331 # arguments in memory --> 332 self.results = batch() 333 334 def get(self): /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.pyc in __call__(self) 129 130 def __call__(self): --> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items] 132 133 def __len__(self): /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/pipeline.pyc in _fit_transform_one(transformer, weight, X, y, **fit_params) 587 **fit_params): 588 if hasattr(transformer, 'fit_transform'): --> 589 res = transformer.fit_transform(X, y, **fit_params) 590 else: 591 res = transformer.fit(X, y, **fit_params).transform(X) /Users/nieguangtao/ml/env_1/lib/python2.7/site-packages/sklearn/pipeline.pyc in fit_transform(self, X, y, **fit_params) 290 Xt, fit_params = self._fit(X, y, **fit_params) 291 if hasattr(last_step, 'fit_transform'): --> 292 return last_step.fit_transform(Xt, y, **fit_params) 293 elif last_step is None: 294 return Xt TypeError: fit_transform() takes exactly 2 arguments (3 given) </code></pre> 我的第一个问题是什么导致了这个错误？在 在得到这个bug之后，我试着弄清楚为什么我把上面的变形金刚一个一个的运行如下： <pre><code>DFS=DataFrameSelector(num_attributes) a1=DFS.fit_transform(housing) imputer=Imputer(strategy='median') a2=imputer.fit_transform(a1) CAA=CombinedAttributesAdder() a3=CAA.fit_transform(a2) SS=StandardScaler() a4=SS.fit_transform(a3) DFS2=DataFrameSelector(cat_attributes) b1=DFS2.fit_transform(housing) LB=LabelBinarizer() b2=LB.fit_transform(b1) result=np.concatenate((a4,b2),axis=1) </code></pre> 除了我得到的结果是努比·恩达雷当<code>housing_prepared = full_pipeline.fit_transform(housing)</code>的预期结果应为坎坷。恩达雷大小（16512,17）。这是我的第二个问题，为什么会造成这种差异？ 外壳是一个大小为（16512，9）的数据帧，只有1个分类特征和8个数字特征。在 提前谢谢你。在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

sklearn转换管道和功能联合

1 个回答

相关Python问题