我正在使用NLTK对文本分析进行Azure ML实现,下面的执行将抛出
AssertionError: 1 columns passed, passed data had 2 columns\r\nProcess returned with non-zero exit code 1
下面是代码
# The script MUST include the following function,
# which is the entry point for this module:
# Param<dataframe1>: a pandas.DataFrame
# Param<dataframe2>: a pandas.DataFrame
def azureml_main(dataframe1 = None, dataframe2 = None):
# import required packages
import pandas as pd
import nltk
import numpy as np
# tokenize the review text and store the word corpus
word_dict = {}
token_list = []
nltk.download(info_or_id='punkt', download_dir='C:/users/client/nltk_data')
nltk.download(info_or_id='maxent_treebank_pos_tagger', download_dir='C:/users/client/nltk_data')
for text in dataframe1["tweet_text"]:
tokens = nltk.word_tokenize(text.decode('utf8'))
tagged = nltk.pos_tag(tokens)
# convert feature vector to dataframe object
dataframe_output = pd.DataFrame(tagged, columns=['Output'])
return [dataframe_output]
在这里抛出错误
dataframe_output = pd.DataFrame(tagged, columns=['Output'])
我怀疑这是传递给dataframe的标记数据类型,是否有人能让我知道将此添加到dataframe的正确方法。
试试这个:
相关问题 更多 >
编程相关推荐