擅长:python、mysql、java
<pre><code>import string
import re
alphabet = string.lowercase + string.uppercase
regex1 = re.compile("(%s)" % "|".join(keywords))
regex2 = re.compile("^(%s)" % "|".join(keywords))
regex3 = re.compile("(%s)$" % "|".join(keywords))
for match in regex1.finditer(TEXT):
cut = TEXT[max(match.start() - before, 0) : match.end() + after]
finalcut = cut
if not regex2.search(cut):
finalcut = finalcut.lstrip(alphabet)
if not regex3.search(cut):
finalcut = finalcut.rstrip(alphabet)
print cut, finalcut
</code></pre>
<p>这可以进一步改进,因为只有两次关键字可以出现在文本的开头或结尾,因此不应该删除。在</p>
^{pr2}$