使用单词“but”和RegEx组成句子

2024-09-28 01:32:34 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图在单词“but”(或任何其他协调连词词)处使用RegEx来分块句子。它不起作用。。。在

sentence = nltk.pos_tag(word_tokenize("There are no large collections present but there is spinal canal stenosis."))
result = nltk.RegexpParser(grammar).parse(sentence)
DigDug = nltk.RegexpParser(r'CHUNK: {.*<CC>.*}')
for subtree in DigDug.parse(sentence).subtrees(): 
    if subtree.label() == 'CHUNK': print(subtree.node())

我需要把"There are no large collections present but there is spinal canal stenosis."分成两个句子:

^{pr2}$

我也希望使用相同的代码来拆分“and”和其他并列连词(CC)的句子。但我的代码不起作用。请帮忙。在


Tags: noiscollectionsaresentence句子butlarge
2条回答

我想你能做到

import re
result = re.split(r"\s+(?:but|and)\s+", sentence)

在哪里

`\s`        Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.)
`+`         Between one and unlimited times, as many times as possible, giving back as needed (greedy)
`(?:`       Match the regular expression below, do not capture
            Match either the regular expression below (attempting the next alternative only if this one fails)
  `but`     Match the characters "but" literally
  `|`       Or match regular expression number 2 below (the entire group fails if this one fails to match)
  `and`     Match the characters "and" literally
)
`\s`        Match a single character that is a "whitespace character" (spaces, tabs, line breaks, etc.)
`+`         Between one and unlimited times, as many times as possible, giving back as needed (greedy)

你可以添加更多的连词,用管道字符|隔开。 请注意,这些单词不包含在regex中具有特殊含义的字符。如果有疑问,请先用re.escape(word)对它们进行转义

如果你想避免硬编码连词,比如“但是”和“和”,那么试着在切分的同时切分:


import nltk
Digdug = nltk.RegexpParser(r""" 
CHUNK_AND_CHINK:
{<.*>+}          # Chunk everything
}<CC>+{      # Chink sequences of CC
""")
sentence = nltk.pos_tag(nltk.word_tokenize("There are no large collections present but there is spinal canal stenosis."))

result = Digdug.parse(sentence)

for subtree in result.subtrees(filter=lambda t: t.label() == 
'CHUNK_AND_CHINK'):
            print (subtree)

Chinking基本上把我们不需要的东西从一个短语中排除掉,在这个例子中是“但是”。 有关详细信息,请参见:http://www.nltk.org/book/ch07.html

相关问题 更多 >

    热门问题