调用fit（）时出现“使用序列设置数组元素”异常

data = {'xMessage': ['There was a farmer who had a dog', 'The mouse ran up the clock', 'Mary had a little lamb', 'The itsy bitsy spider', 'Brother John, Brother John! Morning bells are ringing!', 'My dame has lost her shoe', 'All the kings horses and all the Kings men', 'Im a little teapot', 'Jack and Jill went up the hill', 'How does your garden grow?'], 'x01': [20, 21, 19, 18, 34, 22, 33, 22, 11, 32], 'x02': [0, 10, 10, 12, 34, 43, 12, 0, 0, 54], 'y': [0, 1, 0, 1, 0, 0, 1, 1, 0, 0] } self.df = pd.DataFrame(data) self.train, self.test = train_test_split(self.df, test_size=0.3, shuffle=True) vec = TfidfVectorizer() vec.fit(self.df.xMessage) transformTrain = vec.transform(self.train.xMessage) self.train['messageVect'] = list(transformTrain) transformTest = vec.transform(self.test.xMessage) self.test['messageVect'] = list(transformTest) self.X_train = self.train[['messageVect', 'x01', 'x02']] self.X_test = self.test[['messageVect', 'x01', 'x02']] self.y_train = self.train['y'] self.y_test = self.test['y'] model = GaussianNB() model.fit(self.X_train,self.y_train) predicted= model.predict(self.X_test, self.y_test) y_true, y_pred = self.y_test, model.predict(self.X_test) print(classification_report(y_true, y_pred))

1条回答

网友

1楼 · 发布于 2024-09-30 02:36:25

所以，我能够解决这个问题（或者我希望我做到了）。工作代码如下。让我知道是否可以进一步改进！在

        data = {'xMessage': ['There was a farmer who had a dog',
                         'The mouse ran up the clock',
                         'Mary had a little lamb',
                         'The itsy bitsy spider',
                         'Brother John, Brother John! Morning bells are ringing!',
                         'My dame has lost her shoe',
                         'All the kings horses and all the Kings men',
                         'Im a little teapot',
                         'Jack and Jill went up the hill',
                         'How does your garden grow?'],
            'x01': [20, 21, 19, 18, 34, 22, 33, 22, 11, 32],
            'x02': [0, 10, 10, 12, 34, 43, 12, 0, 0, 54],
            'y': [1, 1, 0, 1, 0, 0, 1, 1, 1, 1]
            }

    df=pd.DataFrame(data)

    vec = TfidfVectorizer()
    df_text = pd.DataFrame(vec.fit_transform(df['xMessage']).toarray())
    self.X_train,self.X_test, self.y_train, self.y_test = train_test_split(pd.concat([df[['x01','x02']],df_text],axis=1),df[['y']], test_size=0.3, shuffle=True)

    model = GaussianNB()
    model.fit(self.X_train,self.y_train)
    y_true, y_pred = self.y_test, model.predict(self.X_test)
    print(classification_report(y_true, y_pred))

注意：This post有很大的帮助。在

相关问题更多 >

编程相关推荐

热门问题

热门文章