我正在尝试此网站的代码:https://datanice.wordpress.com/2015/09/09/sentiment-analysis-for-youtube-channels-with-nltk/
我遇到错误的代码是:
import nltk
from nltk.probability import *
from nltk.corpus import stopwords
import pandas as pd
all = pd.read_csv("comments.csv")
stop_eng = stopwords.words('english')
customstopwords =[]
tokens = []
sentences = []
tokenizedSentences =[]
for txt in all.text:
sentences.append(txt.lower())
tokenized = [t.lower().encode('utf-8').strip(":,.!?") for t in txt.split()]
tokens.extend(tokenized)
tokenizedSentences.append(tokenized)
hashtags = [w for w in tokens if w.startswith('#')]
ghashtags = [w for w in tokens if w.startswith('+')]
mentions = [w for w in tokens if w.startswith('@')]
links = [w for w in tokens if w.startswith('http') or w.startswith('www')]
filtered_tokens = [w for w in tokens if not w in stop_eng and not w in customstopwords and w.isalpha() and not len(w)<3 and not w in hashtags and not w in ghashtags and not w in links and not w in mentions]
fd = FreqDist(filtered_tokens)
这给我的错误是:
^{pr2}$我用以下代码获取csv:
commentDataCsv = pd.DataFrame.from_dict(callFunction).to_csv("comments4.csv", encoding='utf-8')
我已将所有pd.read_json("comments.csv")
替换为read_csv
。在
在Py3中,默认的字符串类型是unicode。
encode
将其转换为bytestring。要将strip
应用于bytestring,需要提供匹配字符:如果不先编码,可以使用默认的unicode字符
^{pr2}$我建议这样做,否则代码的其余部分将需要
b
标记。在相关问题 更多 >
编程相关推荐