删除一个句子块的开头和结尾都有明确的定义

### books 22 feb 2017 21 april 2018 books 22 feb 2017 21 22 feb 2017 21 april feb 2017 21 april 2018 $$$ ### risk true stories people never thought they d dare share risk true stories people never true stories people never thought stories people never thought they people never thought they d never thought they d dare thought they d dare share $$$ ### everyone hanging out without me mindy kaling non fiction everyone hanging out without me hanging out without me mindy out without me mindy kaling without me mindy kaling non me mindy kaling non fiction $$$

for line_no, line in enumerate(books): tokens = line.split(" ") output = list(ngrams(tokens, 5)) booksWithNGrams.append("###") #Adding start of block booksWithNGrams.append(books[line_no]) # Adding original line for x in output: # Adding n-grams booksWithNGrams.append(' '.join(x)) booksWithNGrams.append("$$$") # Adding end of block

1条回答

网友

1楼 · 发布于 2024-10-03 02:48:19

正如你所说的，这个街区限制在#到$之间。我们可以将文本视为这些符号之间的数字序列。使用finditer指出块限制。你知道吗

    import re

    starts =[]
    starts = [s.start() for s in re.finditer('###',text)]
    # [0, 105, 349]          

    ends = []          
    ends   = [e.end() for e in re.finditer(re.escape('$$$'),text)] #special char $
    # [104, 348, 558]

    blocks = []
    blocks = list(starts+ends)
    blocks.sort()

    #sequence of blocks
    nBlocks = [blocks[i:i+2] for i in range(0, len(blocks), 2)]
    #[[0, 104], [105, 348], [349, 558]]


    #find where the input text belongs       
    for i in text:       
        find   = '22 feb 2017 21 april'
        where  = text.index(find)
    # 10  

    #removing block elements    
    for n in range(len(nBlocks)):
        if where in range(nBlocks[n][0],nBlocks[n][1]): 
            for x in range(nBlocks[n][0],nBlocks[n][1]+1):
                             #text starts          #text ends
                 cleanText = text[0:nBlocks[n][0]]+text[nBlocks[n][1]+1::]


    print(cleanText)

    ###
    risk true stories people never thought they d dare share
    risk true stories people never
    true stories people never thought
    stories people never thought they
    people never thought they d
    never thought they d dare
    thought they d dare share
    $$$
    ###
    everyone hanging out without me mindy kaling non fiction
    everyone hanging out without me
    hanging out without me mindy
    out without me mindy kaling
    without me mindy kaling non
    me mindy kaling non fiction
    $$$

相关问题更多 >

编程相关推荐

热门问题

热门文章