'ValueError: Input array dimensions not right for CountVectorizer()'

features = pd.DataFrame([["T. Rowe Price sells most of its Tesla shares", .002152], ["Gannett to retain all seats in MNG proxy fight", 0.002152]], columns=["desc-title", "SPchangeHigh"])

ValueError Traceback (most recent call last) <ipython-input-71-d77f136b9586> in <module>() 3 ( CountVectorizer(tokenizer=tokenize),['desc-title']) 4 ) ----> 5 preprocess.fit_transform(features.head(2)) C:\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in fit_transform(self, X, y) 488 self._validate_output(Xs) 489 --> 490 return self._hstack(list(Xs)) 491 492 def transform(self, X): C:\anaconda3\lib\site-packages\sklearn\compose\_column_transformer.py in _hstack(self, Xs) 545 else: 546 Xs = [f.toarray() if sparse.issparse(f) else f for f in Xs] --> 547 return np.hstack(Xs) 548 549 C:\anaconda3\lib\site-packages\numpy\core\shape_base.py in hstack(tup) 338 return _nx.concatenate(arrs, 0) 339 else: --> 340 return _nx.concatenate(arrs, 1) 341 342 ValueError: all the input array dimensions except for the concatenation axis must match exactly

1条回答

网友

1楼 · 发布于 2024-10-02 08:22:29

删除“说明标题”周围的括号。你想要的是一维数组，而不是列向量。你知道吗

preprocess = make_column_transformer(
    (StandardScaler(),['SPchangeHigh']),
    ( CountVectorizer(),'desc-title')
)
preprocess.fit_transform(features.head(2))

Sklearn documentation describes this nuanced specification：

The difference between specifying the column selector as 'column' (as a simple string) and ['column'] (as a list with one element) is the shape of the array that is passed to the transformer. In the first case, a one dimensional array will be passed, while in the second case it will be a 2-dimensional array with one column, i.e. a column vector
...
Be aware that some transformers expect a 1-dimensional input (the label-oriented ones) while some others, like OneHotEncoder or Imputer, expect 2-dimensional input, with the shape [n_samples, n_features].

相关问题更多 >

编程相关推荐

热门问题

热门文章