擅长:python、mysql、java
<p>下面的函数替换任意数量的匹配项(使用spaCy查找),保持与原始文本相同的空格,并适当处理边缘情况(如匹配项位于文本开头时):</p>
<pre class="lang-py prettyprint-override"><code>import spacy
from spacy.matcher import Matcher
nlp = spacy.load("en_core_web_lg")
matcher = Matcher(nlp.vocab)
matcher.add("dog", None, [{"LOWER": "dog"}])
def replace_word(orig_text, replacement):
tok = nlp(orig_text)
text = ''
buffer_start = 0
for _, match_start, _ in matcher(tok):
if match_start > buffer_start: # If we've skipped over some tokens, let's add those in (with trailing whitespace if available)
text += tok[buffer_start: match_start].text + tok[match_start - 1].whitespace_
text += replacement + tok[match_start].whitespace_ # Replace token, with trailing whitespace if available
buffer_start = match_start + 1
text += tok[buffer_start:].text
return text
>>> replace_word("Hi this is my dog.", "Simba")
Hi this is my Simba.
>>> replace_word("Hi this dog is my dog.", "Simba")
Hi this Simba is my Simba.
</code></pre>