<blockquote>
<p><strong>Question</strong>: a keyword like "Sog" it also finds the Sogan ... I only want tokens between whitespaces. ... how can i add that regex to this code. </p>
</blockquote>
<p>用你的<code>keywords</code>构建一个<code>regex</code>,用<code>or |</code>分隔符表示多个<code>keywords</code>。你知道吗</p>
<p>例如:</p>
<pre><code>import re
def index(lines, keyword):
rc = re.compile(".*?(({})\+.+?\s)".format(keyword))
for i, line in enumerate(lines):
match = rc.match(line)
if match:
print("lines[{}] match:{}\n{}".format(i, match.groups(), line))
if __name__ == "__main__":
lines = [
'Sogan+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elmaro+Noun ve+Conj ... (omitted for brevity)',
'Sog+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elma+Noun ve+Conj ... (omitted for brevity)',
]
index(lines, 'elma')
index(lines, 'Sog|elma')
</code></pre>
<blockquote>
<p><strong>Output</strong>:</p>
<pre><code>lines[1] match:('elma+Noun ', 'elma')
Sog+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elma+Noun ve+Conj ... (omitted for brevity)
lines[1] match:('Sog+Noun ', 'Sog')
Sog+Noun ,+Punc domates+Noun ,+Punc patates+Noun ,+Punc elma+Noun ve+Conj ... (omitted for brevity)
</code></pre>
</blockquote>
<p>用Python:3.5测试</p>