擅长:python、mysql、java
<p>可以使用<a href="https://docs.python.org/2/library/re.html#re.finditer" rel="nofollow">^{<cd1>}</a>查找字符串中的所有匹配项。每个匹配对象都有一个<a href="https://docs.python.org/2/library/re.html#re.MatchObject.start" rel="nofollow">^{<cd2>}</a>方法,可以用来计算字符串中的位置。您也不需要检查键是否在字符串中,因为<code>finditer</code>返回一个空的迭代器:</p>
<pre><code>keywords = ("banana", "apple", "orange", ...)
before = 50
after = 100
TEXT = "a big text string, i.e., a page of a book"
for k in keywords:
for match in re.finditer(k, TEXT):
position = match.start()
cut = TEXT[max(position - before, 0):position + after] # max is needed because that index must not be negative
trimmed_match = re.match("\w*?\W+(.*)\W+\w*", cut, re.MULTILINE)
finalcut = trimmed_match.group(1)
</code></pre>
<p>regex会修剪所有内容,包括第一个非单词字符序列,以及最后一个非单词字符序列(如果文本中有新行,我添加了<code>re.MULTILINE</code>)</p>