在一个文本文件中找到一个关键字，然后抓住这个单词后面的n个单词

网友
1楼 · 编辑于 2024-10-03 02:47:16

你有两种方法来解决它
1使用解霸
jieba.cut
它能把你的句子拼成文字
只要找到“人口”然后找到下三个词
2使用溢出
raw = 'YOUR_TEXT_CONTENT' raw_list = raw.split(' ') start = raw_list.index('populations') print(raw_list[start:start+4])

网友
2楼 · 编辑于 2024-10-03 02:47:16

将文本拆分为单词，找到关键字的索引，抓住下一个索引处的单词：
text = 'The Supplemental Tables consist of 59 detailed tables tabulated on the 2016 1-year microdata for geographies with populations of 20,000 people or more. These Supplemental Estimates are available through American FactFinder and the Census Bureau’s application programming interface at the same geographic summary levels as those in the American Community Survey.' keyword = 'populations' words = text.split() index = words.index(keyword) wanted_words = words[index + 1:index + 4]
如果您希望将三个单词的列表wanted_words重新编成一个字符串，请使用
wanted_text = ' '.join(wanted_words)

网友
3楼 · 编辑于 2024-10-03 02:47:16

你可以使用nltk库。你知道吗

from nltk.tokenize import word_tokenize

def sample(string, keyword, n):
    output = []
    word_list = word_tokenize(string.lower())
    indices = [i for i, x in enumerate(word_list) if x==keyword]
    for index in indices:
        output.append(word_list[index+1:index+n+1])
    return output


>>>print sample(string, 'populations', 3)
>>>[['of', '20,000', 'people']]
>>>print sample(string, 'tables', 3)
>>>[['consist', 'of', '59'], ['tabulated', 'on', 'the']]

相关问题更多 >

编程相关推荐

热门问题

热门文章