句子中引用文本的检测

2条回答

网友

1楼 · 编辑于 2024-06-26 12:40:11

另一种方法是使用与regex完全不同的技术，shlex

The shlex class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell. This will often be useful for writing minilanguages, (for example, in run control files for Python applications) or for parsing quoted strings.

shlex.split在拆分为单词时考虑引号，可选的posix参数将引号保留在结果中。通过它的输出，您可以创建一个类似于您描述的字符串

import shlex

lines = [
'Why did the author use three sentences in a row that start with the words, "it spun"?',
'Why did the queen most likely say  “I would have tea instead.”',
'Why did the fdsfdsf repeat the phrase "he waited" so many times?',
'Why were "the lights of his town growing smaller below them"?',
'What is a fdsfdsf for the word "adjust"?', 'Reread this: "If anybody had asked trial of answered at once, \'My nose.\'" What is the correct definition of the word "trial" as it is used here?',
'Reread these sentences: "This was his courtship, and it lasted all through the summer." What does the word "courtship" mean?',
]
for line in lines:
    print(
        " ".join(
            word
            if word[0] != '"' and word[-1] != '"' else '"<quote>"'
            for word in shlex.split(line, posix=False)
        )
    )

输出：

Why did the author use three sentences in a row that start with the words, "<quote>" ?
Why did the queen most likely say “I would have tea instead.”
Why did the fdsfdsf repeat the phrase "<quote>" so many times?
Why were "<quote>" ?
What is a fdsfdsf for the word "<quote>" ?
Reread this: "<quote>" What is the correct definition of the word "<quote>" as it is used here?
Reread these sentences: "<quote>" What does the word "<quote>" mean?

注1：shlex不会将卷曲引号解释为引号（例如第2行），因此如果您有卷曲引号，您应该在将每一行输入之前.replace()将它们转换为引号
注2：这将替换所有引用的事件，但如果您只需要第一个事件并保留其余的事件，则可以这样做（非常确定这可以写得更好，但可以将其作为概念证明）：

for line in lines:
    new_line = []
    quote_count = 0
    for word in shlex.split(line, posix=False):
        if word[0] == '"' and word[-1] == '"':
            if quote_count < 1:
                quote_count += 1
                new_line.append('"<quote>"')
            else:
                new_line.append(word)
        else:
            new_line.append(word)
    print(' '.join(new_line))

输出：

Why did the author use three sentences in a row that start with the words, "<quote>" ?
Why did the queen most likely say “I would have tea instead.”
Why did the fdsfdsf repeat the phrase "<quote>" so many times?
Why were "<quote>" ?
What is a fdsfdsf for the word "<quote>" ?
Reread this: "<quote>" What is the correct definition of the word "trial" as it is used here?
Reread these sentences: "<quote>" What does the word "courtship" mean?

网友

2楼 · 编辑于 2024-06-26 12:40:11

对于这些示例，请使用

import re
txt = """Why did the author use three sentences in a row that start with the words, "it spun"?
Why did the queen most likely say  “I would have tea instead.”
Why did the fdsfdsf repeat the phrase "he waited" so many times?
Why were "the lights of his town growing smaller below them"?
What is a fdsfdsf for the word "adjust"?
Reread these sentences: "This was his courtship, and it lasted all through the summer." What does the word "courtship" mean?"""
txt = re.sub(r'''"([^"]*)"''', lambda m: '<quote>' if len(m.group(1))>19 else m.group(), txt)
txt = re.sub(r'“[^“”]{20,}”', '<quote>', txt)
print(txt)

见Python proof。对于各种类型的引号，请使用单独的命令，这样更易于控制

结果：

Why did the author use three sentences in a row that start with the words, "it spun"?
Why did the queen most likely say  <quote>
Why did the fdsfdsf repeat the phrase "he waited" so many times?
Why were <quote>?
What is a fdsfdsf for the word "adjust"?
Reread these sentences: <quote> What does the word "courtship" mean?

相关问题更多 >

编程相关推荐

热门问题

热门文章

句子中引用文本的检测

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >