可能在这里放了个脑屁。我正在使用生成器获取一个列表,我很难使用set
从列表中删除重复项
import spacy
import textacy
nlp = spacy.load("en_core_web_lg")
text = ('''The Western compulsion to hedonism has made us lust for money just to show that we have it. Possessions do not make a man—man makes possessions. The Western compulsion to hedonism has made us lust for money just to show that we have it. Possessions do not make a man—man makes possessions.''')
doc = nlp(text)
keywords = list(textacy.extract.ngrams(doc, 1, filter_stops=True, filter_punct=True, filter_nums=False)) + list(textacy.extract.ngrams(doc, 2, filter_stops=True, filter_punct=True, filter_nums=False))
print(list(set(keywords)))
结果包含重复项:
[man, lust, makes possessions, man, Possessions, makes possessions, man makes, hedonism, man, money, compulsion, Western compulsion, man, possessions, man makes, compulsion, Possessions, Western compulsion, possessions, Western, makes, makes, lust, hedonism, Western, money]
这是因为列表中的项不是字符串,所以它们实际上不是重复项
要仅获取重复的单词,您可以使用字典理解,将其字符串表示形式作为唯一键,然后仅使用
.values()
取出spacy对象:相关问题 更多 >
编程相关推荐