从字符串中根据子字符串匹配和字符串索引获取子字符串

2024-10-05 14:27:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个不同的字符串,其中肯定包含myWord(在某些情况下,多次出现,只应处理第一次出现),但字符串的长度不同。其中有些包含数百个子字符串,有些只包含几个子字符串

我想找到一个解决方案,从文本中获得一个片段。规则如下:代码段应该包含myWord和前后的X单词

像这样:

rawText= "This is an example lorem ipsum sentence for a Stackoverflow question."

myWord = "sentence"

假设我想从单词“句子”和加/减3个单词中获取内容,如下所示:

"example lorem ipsum sentence for a Stackoverflow"

我可以创建一个有效的解决方案,但是它使用字符数来剪切代码段,而不是使用myWord之前/之后的字数。所以我的问题是,有没有更合适的解决方案,也许是一个内置的Python函数来实现我的目标

我目前使用的解决方案是:

myWord = "mollis"
rawText = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse sit amet arcu vulputate, sodales arcu non, finibus odio. Aliquam sed tincidunt nisi, eu scelerisque lectus. Curabitur in nibh enim. Duis arcu ante, mollis sed iaculis non, hendrerit ut odio. Curabitur gravida condimentum posuere. Sed et arcu finibus felis auctor mollis et id risus. Nam urna tellus, ultricies a aliquam at, euismod et erat. Cras pretium venenatis ornare. Donec pulvinar dui eu dui facilisis commodo. Vivamus eget ultrices turpis, vel egestas lacus."

# The index where the word is located
wordIndexNumber = rawText.lower().find("%s" % (myWord,))

# The total length of the text (in chars)
textLength = len(rawText)

textPart2 = len(rawText)-wordIndexNumber

if wordIndexNumber < 80:
    textIndex1 = 0
else:
    textIndex1 = wordIndexNumber - 80

if textPart2 < 80:
    textIndex2 = textLength
else:
    textIndex2 = wordIndexNumber + 80

snippet = rawText[textIndex1:textIndex2]

print (snippet)

Tags: 字符串is代码段解决方案单词sentenceetipsum
2条回答

这是一种使用字符串切片的方法

演示:

rawText= "This is an example lorem ipsum sentence for a Stackoverflow question."
myWord = "sentence"
rawTextList = rawText.split()
frontVal = " ".join( rawTextList[rawTextList.index(myWord)-3:rawTextList.index(myWord)] )
backVal = " ".join( rawTextList[rawTextList.index(myWord):rawTextList.index(myWord)+4] )

print("{} {}".format(frontVal, backVal))

输出:

example lorem ipsum sentence for a Stackoverflow

下面是使用数组切片的解决方案

def get_context_around(text, word, accuracy):
    words = text.split()
    first_hit = words.index(word)

    return ' '.join(words[first_hit - accuracy:first_hit + accuracy + 1])


raw_text= "This is an example lorem ipsum sentence for a Stackoverflow question."
my_word = "sentence"
print(get_context_around(raw_text, my_word, accuracy=3)) # example lorem ipsum sentence for a Stackoverflow

相关问题 更多 >