<p>设置。。。2句话代表感兴趣的案例:</p>
<pre><code>text = "He lives in Nidarvoll and tonight i must reach a train to Oslo at 6 oclock. The system, called BusTUC is built upon the classical system CHAT-80 (Warren and Pereira, 1982). CHAT-80 was a state of the art natural language system that was impressive on its own merits."
t2 = "He lives in Nidarvoll and tonight i must reach a train to Oslo at 6 oclock. The system, called BusTUC is built upon the classical system CHAT-80 (Warren and Pereira, 1982) fgbhdr was a state of the art natural. CHAT-80 was a state of the art natural language system that was impressive on its own merits."
</code></pre>
<p>首先,在引文在句子末尾的情况下进行匹配:</p>
^{pr2}$
<p>当引文不在句子末尾时匹配:</p>
<pre><code>p2 = "\. (.*\([A-za-z]+ .* [0-9]+\)[^\.]+\.+?)"
</code></pre>
<p>将这两种情况与“|”regex运算符结合使用:</p>
<pre><code>p_main = re.compile("\. (.*\([A-za-z]+ .* [0-9]+\)\.+?)"
"|\. (.*\([A-za-z]+ .* [0-9]+\)[^\.]+\.+?)")
</code></pre>
<p>运行中:</p>
<pre><code>>>> print(re.findall(p_main, text))
[('The system, called BusTUC is built upon the classical system CHAT-80 (Warren and Pereira, 1982).', '')]
>>>print(re.findall(p_main, t2))
[('', 'The system, called BusTUC is built upon the classical system CHAT-80 (Warren and Pereira, 1982) fgbhdr was a state of the art natural.')]
</code></pre>
<p>在这两种情况下,你都会得到带有引文的句子。在</p>
<p>一个好的资源是python正则表达式<a href="https://docs.python.org/3/library/re.html" rel="nofollow noreferrer">documentation</a>和附带的regex <a href="https://docs.python.org/3/howto/regex.html#regex-howto" rel="nofollow noreferrer">howto</a>页面。在</p>
<p>干杯</p>