我正在使用Python 3.6.8
我有一个文本文件像-
###
books 22 feb 2017 21 april 2018
books 22 feb 2017 21
22 feb 2017 21 april
feb 2017 21 april 2018
$$$
###
risk true stories people never thought they d dare share
risk true stories people never
true stories people never thought
stories people never thought they
people never thought they d
never thought they d dare
thought they d dare share
$$$
###
everyone hanging out without me mindy kaling non fiction
everyone hanging out without me
hanging out without me mindy
out without me mindy kaling
without me mindy kaling non
me mindy kaling non fiction
$$$
我们用-
for line_no, line in enumerate(books):
tokens = line.split(" ")
output = list(ngrams(tokens, 5))
booksWithNGrams.append("###") #Adding start of block
booksWithNGrams.append(books[line_no]) # Adding original line
for x in output: # Adding n-grams
booksWithNGrams.append(' '.join(x))
booksWithNGrams.append("$$$") # Adding end of block
正如你所看到的,一个n字元的句子以###
开头,以$$$
结尾。因此,块的开始和结束是明确定义的。你知道吗
给定一个句子,我想删除一个块。例如-如果我输入22 feb 2017 21 april
,我想删除-
###
books 22 feb 2017 21 april 2018
books 22 feb 2017 21
22 feb 2017 21 april
feb 2017 21 april 2018
$$$
我该怎么做?你知道吗
正如你所说的,这个街区限制在#到$之间。 我们可以将文本视为这些符号之间的数字序列。 使用finditer指出块限制。你知道吗
相关问题 更多 >
编程相关推荐