我有一个不同的字符串,其中肯定包含myWord
(在某些情况下,多次出现,只应处理第一次出现),但字符串的长度不同。其中有些包含数百个子字符串,有些只包含几个子字符串
我想找到一个解决方案,从文本中获得一个片段。规则如下:代码段应该包含myWord和前后的X
单词
像这样:
rawText= "This is an example lorem ipsum sentence for a Stackoverflow question."
myWord = "sentence"
假设我想从单词“句子”和加/减3个单词中获取内容,如下所示:
"example lorem ipsum sentence for a Stackoverflow"
我可以创建一个有效的解决方案,但是它使用字符数来剪切代码段,而不是使用myWord
之前/之后的字数。所以我的问题是,有没有更合适的解决方案,也许是一个内置的Python函数来实现我的目标
我目前使用的解决方案是:
myWord = "mollis"
rawText = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Suspendisse sit amet arcu vulputate, sodales arcu non, finibus odio. Aliquam sed tincidunt nisi, eu scelerisque lectus. Curabitur in nibh enim. Duis arcu ante, mollis sed iaculis non, hendrerit ut odio. Curabitur gravida condimentum posuere. Sed et arcu finibus felis auctor mollis et id risus. Nam urna tellus, ultricies a aliquam at, euismod et erat. Cras pretium venenatis ornare. Donec pulvinar dui eu dui facilisis commodo. Vivamus eget ultrices turpis, vel egestas lacus."
# The index where the word is located
wordIndexNumber = rawText.lower().find("%s" % (myWord,))
# The total length of the text (in chars)
textLength = len(rawText)
textPart2 = len(rawText)-wordIndexNumber
if wordIndexNumber < 80:
textIndex1 = 0
else:
textIndex1 = wordIndexNumber - 80
if textPart2 < 80:
textIndex2 = textLength
else:
textIndex2 = wordIndexNumber + 80
snippet = rawText[textIndex1:textIndex2]
print (snippet)
这是一种使用字符串切片的方法
演示:
输出:
下面是使用数组切片的解决方案
相关问题 更多 >
编程相关推荐