TypeError:需要类似字节的对象，而不是pd.读卡器

import nltk from nltk.probability import * from nltk.corpus import stopwords import pandas as pd all = pd.read_csv("comments.csv") stop_eng = stopwords.words('english') customstopwords =[] tokens = [] sentences = [] tokenizedSentences =[] for txt in all.text: sentences.append(txt.lower()) tokenized = [t.lower().encode('utf-8').strip(":,.!?") for t in txt.split()] tokens.extend(tokenized) tokenizedSentences.append(tokenized) hashtags = [w for w in tokens if w.startswith('#')] ghashtags = [w for w in tokens if w.startswith('+')] mentions = [w for w in tokens if w.startswith('@')] links = [w for w in tokens if w.startswith('http') or w.startswith('www')] filtered_tokens = [w for w in tokens if not w in stop_eng and not w in customstopwords and w.isalpha() and not len(w)<3 and not w in hashtags and not w in ghashtags and not w in links and not w in mentions] fd = FreqDist(filtered_tokens)

1条回答

网友

1楼 · 发布于 2024-04-19 10:27:03

在Py3中，默认的字符串类型是unicode。encode将其转换为bytestring。要将strip应用于bytestring，需要提供匹配字符：

In [378]: u'one'.encode('utf-8')                                                     
Out[378]: b'one'
In [379]: 'one'.encode('utf-8').strip(':')                                           
                                     -
TypeError                                 Traceback (most recent call last)
<ipython-input-379-98728e474af8> in <module>
  > 1 'one'.encode('utf-8').strip(':')

TypeError: a bytes-like object is required, not 'str'

In [381]: 'one:'.encode('utf-8').strip(b':')                                         
Out[381]: b'one'

如果不先编码，可以使用默认的unicode字符

^{pr2}$

我建议这样做，否则代码的其余部分将需要b标记。在

相关问题更多 >

编程相关推荐

热门问题

热门文章