空间替换令牌

2024-09-27 07:20:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在不破坏句子空间结构的情况下替换一个单词。假设我有一个句子text = "Hi this is my dog."。我希望用Simba来代替狗。根据https://stackoverflow.com/a/57206316/2530674的答复,我做了:

import spacy
nlp = spacy.load("en_core_web_lg")
from spacy.tokens import Doc

doc1 = nlp("Hi this is my dog.")
new_words = [token.text if token.text!="dog" else "Simba" for token in doc1]
Doc(doc1.vocab, words=new_words)
# Hi this is my Simba . 

注意,在句号之前的末尾有一个额外的空格(应该是Hi this is my Simba.)。有没有办法消除这种行为。很高兴得到一个一般的python字符串处理答案


Tags: textimporttokendocnlpspacyismy
3条回答

看来你在找一个常规的替代品?我会的

string = "Hi this is my dog."
string = string.replace("dog","Simba")

下面的函数替换任意数量的匹配项(使用spaCy查找),保持与原始文本相同的空格,并适当处理边缘情况(如匹配项位于文本开头时):

import spacy
from spacy.matcher import Matcher

nlp = spacy.load("en_core_web_lg")

matcher = Matcher(nlp.vocab)
matcher.add("dog", None, [{"LOWER": "dog"}])

def replace_word(orig_text, replacement):
    tok = nlp(orig_text)
    text = ''
    buffer_start = 0
    for _, match_start, _ in matcher(tok):
        if match_start > buffer_start:  # If we've skipped over some tokens, let's add those in (with trailing whitespace if available)
            text += tok[buffer_start: match_start].text + tok[match_start - 1].whitespace_
        text += replacement + tok[match_start].whitespace_  # Replace token, with trailing whitespace if available
        buffer_start = match_start + 1
    text += tok[buffer_start:].text
    return text

>>> replace_word("Hi this is my dog.", "Simba")
Hi this is my Simba.

>>> replace_word("Hi this dog is my dog.", "Simba")
Hi this Simba is my Simba.

text='你好,这是我的狗' 打印(text.replace('dog','simba'))

相关问题 更多 >

    热门问题