Python：基于Criteri从行正则表达式中提取句子问题的回答

Python：基于Criteri从行正则表达式中提取句子

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

有点像python/编程新手。。。在 我正试图想出一个正则表达式，它可以处理从文本文件中的一行中提取句子，然后将它们附加到列表中。代码： <pre><code>import re txt_list = [] with open('sample.txt', 'r') as txt: patt = r'.*}[.!?]\s?\n?|.*}.+[.!?]\s?\n?' read_txt = txt.readlines() for line in read_txt: if line == "\n": txt_list.append("\n") else: found = re.findall(patt, line) for f in found: txt_list.append(f) for line in txt_list: if line == "\n": print "newline" else: print line </code></pre> 以上代码最后5行的打印输出： ^{pr2}$ '的内容示例.txt'： ^{3}$ 我已经玩了几个小时的正则表达式，我似乎无法破解它。就目前而言，regex在<code>for lunch?</code>结尾处不匹配。因此这两个句子<code>What {will|shall|should} we {eat|have} for lunch? Peas by the {thousand|hundred|1000} said Dr. Munchauson; {that|is} what he said.</code>没有分开；这就是我想要的。在 正则表达式的一些重要细节： <ul> <li>每一句话都会以句号、感叹号或问号结尾</li> <li>每个句子都至少包含一对大括号{}，并在其中添加一些单词。而且在每句话的最后一个括号后也不会有误导性的“.”。因此<code>Dr.</code>总是在每个句子最后一对花括号之前。这就是为什么我尝试使用'}'来建立我的正则表达式。这样我就可以避免使用异常方法，为<code>Dr.</code>、<code>Jr.</code>、<code>approx.</code>等语法创建异常。对于我运行这段代码的每个文件，我个人都要确保在任何句子的最后一个“}”后面没有“误导性句点”。在</li> </ul> 我想要的输出是： <pre><code>{Hello there|Hello|Howdy} Dr. Munchauson you {gentleman|fine fellow}! What {will|shall|should} we {eat|have} for lunch? Peas by the {thousand|hundred|1000} said Dr. Munchauson; {that|is} what he said. newline I am the {very last|last} sentence for this {instance|example}. </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

Python：基于Criteri从行正则表达式中提取句子

1 个回答

相关Python问题