为什么if不在(x,y)中在python中根本不起作用

2024-10-02 14:24:42 发布

您现在位置:Python中文网/ 问答频道 /正文

仅当我的列的每行中的单词不在停止词和字符串中时,我才想选择单词标点符号

这是我标记和删除停止字后的数据,我还想在删除停止字的同时删除标点符号。在usf后面的第二个字母中有逗号。我想到if word not in (stopwords,string.punctuation),因为它将是not in stopwords and not in string.punctuation,我从here看到它,但它导致无法删除停止词和标点符号。如何解决这个问题

data['text'].head(5)
Out[38]: 
0    ['ve, searching, right, words, thank, breather...
1    [free, entry, 2, wkly, comp, win, fa, cup, fin...
2    [nah, n't, think, goes, usf, ,, lives, around,...
3    [even, brother, like, speak, ., treat, like, a...
4                                 [date, sunday, !, !]
Name: text, dtype: object
import pandas as pd
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
import string

data = pd.read_csv(r"D:/python projects/read_files/SMSSpamCollection.tsv",
                    sep='\t', header=None)

data.columns = ['label','text']

stopwords = set(stopwords.words('english'))

def process(df):
    data = word_tokenize(df.lower())
    data = [word for word in data if word not in (stopwords,string.punctuation)]
    return data

data['text'] = data['text'].apply(process)

Tags: textinimportdatastringifnot单词
3条回答

在函数过程中,必须将类型(字符串)转换为pandas.core.series.series并使用 海螺

该职能将是:

" def过程(df):

  data = word_tokenize(df.lower())

  data = [word for word in data if word not in 
  pd.concat([stopwords,pd.Series(string.punctuation)])  ]

  return data

如果您仍然希望在一个if语句中执行此操作,则可以将string.punctuation转换为一个集合,并将其与stopwordsOR操作结合起来。这就是它的样子:

data = [word for word in data if word not in (stopwords|set(string.punctuation))]

那你需要换衣服了

data = [word for word in data if word not in (stopwords,string.punctuation)]

data = [word for word in data if word not in stopwords and word not in string.punctuation]

相关问题 更多 >