<pre><code>def clean_text (text):
'''Text Preprocessing '''
# Convert words to lower case
text = text.lower()
#Expand contractions
if True:
text = text.split()
new_text= []
for word in text:
if word in contractions:
new_text.append(contractions [word])
else:
new_text.append(word)
text = "".join(new_text)
# Format words and remove unwanted characters
text = re.sub(r'https?:\/\/[\r\n],"[\r\n]"', '', text, flags=re.MULTILINE)
text = re.sub(r'\<a href', ' ', text)
text = re.sub(r'&amp;', '', text)
text- re.sub(r'[_"\-;%()|+&=*%.,!?:#$@\[\]/]',' ', text)
text = re.sub(r'<br />', ' ', text)
text = re.sub(r'\'', ' ', text)
#remove stopwords
if remove_stopwords:
text = text.split()
stops = set(stopwords.words ("english"))
text = [w for w in text if not w in stops]
text = "" .join(text)
# Tokenize each word
text = nltk.WordPunctTokenizer().tokenize(text)
text = nltk.TreebankWordTokenizer().tokenize(text)
text = nltk.WordPunctTokenizer().tokenize(text)
#Lemmatize each token
lemm = nltk.stem.WordNetLemmatizer()
text = list(map(lambda word:list(map(lemm.lemmatize, word)), text))
return text
</code></pre>
<p>当我运行上面的代码时,它运行时没有问题。
但是当我使用上面的def运行下面的代码时,它显示<code>"argument of type 'module' is not iterable'</code></p>
<pre><code>sentences_train = list(map(clean_text, sentences_train))
</code></pre>
<p>我已经附上了一个错误的图像作为参考。
<a href="https://i.stack.imgur.com/R3Qvu.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/R3Qvu.png" alt="enter image description here"/></a></p>
<p>我尝试了不同的方法来解决这个问题,但这会使错误变得更严重。如果有人能帮我,告诉我为什么会发生这种情况,那就太好了。非常感谢。
我们会考虑任何建议</p>