回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>在我的<code>Python</code>代码中,我有一个字符串,并试图查找该字符串是否包含特定的模式(名称后面是数字)。为此,我使用<code>re.match</code>然后<code>groups()</code>它来获得这样的所需结果</p>
<pre><code>authors_and_year = re.match('(.*)\. (\d{4})\.', line)
texts, authors, year = authors_and_year.groups()
</code></pre>
<p>如果我有一根这样的线</p>
<blockquote>
<p>Regina Barzilay and Lillian Lee. 2004. Catching the drift: Probabilistic content models, with applications to generation and summarization. In Proceedings of NAACL-HLT.</p>
</blockquote>
<p>它将返回我这个(<strong>如预期的那样</strong>)</p>
<pre><code>('Regina Barzilay and Lillian Lee. 2004.', 'Regina Barzilay and Lillian Lee', '2004')
</code></pre>
<p>但在某些情况下,我有这样的字符串</p>
<blockquote>
<p>J. Cohen. 1968a. Weighted kappa: Nominal scale agreement with provision for scaled disagreement or partial credit. volume 70, pages 213–220</p>
</blockquote>
<p>或者这个,</p>
<blockquote>
<p>Ralph Weischedel, Jinxi Xu, and Ana Licuanan. 1968b. A hybrid approach to answering biographical questions. In Mark Maybury, editor, New Directions In Question Answering, chapter 5. AAAI Press</p>
</blockquote>
<p>当年份有字母表时,<strong>因此上层正则表达式在这里失败。为了处理这个场景,我尝试添加一个新的正则表达式,如下所示</p>
<pre><code>authors_and_year = re.match('((.*)\. (\d{4})\.|(.*)\. (\d{4})(a-z){1}\.)', line)
texts, authors, year = authors_and_year.groups()
</code></pre>
<p>但它给了我这个错误</p>
<blockquote>
<p>ValueError: too many values to unpack (expected 3)</p>
</blockquote>
<p>当我检查<code>authors_and_year</code>值时,它是这样的</p>
<pre><code>('Regina Barzilay and Lillian Lee. 2004.', 'Regina Barzilay and Lillian Lee', '2004', None, None, None)
</code></pre>
<p>我不知道最后3<code>None</code>个值是从哪里来的。谁能告诉我我做错了什么</p>