<p>您误解了character类的作用;您的模式匹配<em>任何</em>字符串包含<code>T</code>、<code>o</code>、<code>:</code>或空格字符。你知道吗</p>
<p>这是因为<code>[To:\s]</code>模型是一个<em>字符类</em>,集合中的任何一个字符都将匹配。这就是为什么<code>From:</code>行匹配;<code>:</code>和<code>d</code>之间的空格在这里就足够了。你知道吗</p>
<p>如果需要验证整个头文件名,请将匹配项锚定到<code>^</code>行的开头,但删除该字符类:</p>
<pre><code>r'^To:\s+[\w\.-]+@[\w\.-]+'
</code></pre>
<p>现在,<code>To:</code>部分仅在行首匹配,前提是使用<code>re.MULTILINE</code>标志:</p>
<pre><code>>>> import re
>>> text = '''\
... Date: Wed, 6 Dec 2000 02:03:00 -0800 (PST)
... From: donald.herrick@enron.com
... To: brianherrick@email.msn.com, herriceu2@tdprs.state.tx.us,
... robertherrick@bankunited.com, kristi.demaiolo@enron.com,
... suresh.raghavan@enron.com, harry.arora@enron.com
... Subject: FW: If Santa Answered his mail...
... Mime-Version: 1.0
... Content-Type: text/plain; charset=us-ascii
... Content-Transfer-Encoding: 7bit
... X-From: Donald W Herrick
... X-To: brianherrick@email.msn.com, HERRICEU2@tdprs.state.tx.us, RobertHerrick@bankunited.com, Kristi Demaiolo, Suresh Raghavan, Harry Arora
... X-cc:
... X-bcc:
... '''
>>> re.findall(r'^To:\s+[\w\.-]+@[\w\.-]+', text)
[]
>>> re.findall(r'^To:\s+[\w\.-]+@[\w\.-]+', text, flags=re.M)
['To: brianherrick@email.msn.com']
</code></pre>
<p>这只能与第一个电子邮件地址匹配,并且仅当它不包含全名之类的内容时(例如<code>Brian Herrick <brianherrick@email.msn.com></code>)。你知道吗</p>
<p>您必须匹配整个<em>标题</em>:</p>
<pre><code>re.findall(r'^To:\s+((?:.*(?:\n[ \t]+)?)*)', text, flags=re.M)
</code></pre>
<p>它匹配<code>To:</code>头,后跟任意数量的头续行(以空格开始):</p>
<pre><code>>>> re.findall(r'^To:\s+((?:.*(?:\n[ \t]+)?)*)', text, flags=re.M)
['brianherrick@email.msn.com, herriceu2@tdprs.state.tx.us, \n robertherrick@bankunited.com, kristi.demaiolo@enron.com, \n suresh.raghavan@enron.com, harry.arora@enron.com']
</code></pre>
<p>你必须把电子邮件地址从中分离出来。你知道吗</p>
<p>就我个人而言,我会研究<a href="https://docs.python.org/2/library/email.html" rel="nofollow">^{<cd13>} package</a>,相反,它会使抓取标题更容易:</p>
<pre><code>import email
message = email.message_from_string(text)
to_headers = message.get_all('to')
addresses = email.utils.getaddresses(to_headers)
</code></pre>
<p>演示:</p>
<pre><code>>>> import email
>>> m = email.message_from_string(text)
>>> m.get_all('to')
['brianherrick@email.msn.com, herriceu2@tdprs.state.tx.us, \n robertherrick@bankunited.com, kristi.demaiolo@enron.com, \n suresh.raghavan@enron.com, harry.arora@enron.com']
>>> email.utils.getaddresses(m.get_all('to'))
[('', 'brianherrick@email.msn.com'), ('', 'herriceu2@tdprs.state.tx.us'), ('', 'robertherrick@bankunited.com'), ('', 'kristi.demaiolo@enron.com'), ('', 'suresh.raghavan@enron.com'), ('', 'harry.arora@enron.com')]
</code></pre>
<p>现在你有了所有的电子邮件地址。你知道吗</p>
<p>在使用正则表达式时也可以应用<a href="https://docs.python.org/2/library/email.util.html#email.utils.getaddresses" rel="nofollow">^{<cd14>} function</a>:</p>
<pre><code>>>> email.utils.getaddresses(re.findall(r'^To:\s+((?:.*(?:\n[ \t]+)?)*)', text, flags=re.M))
[('', 'brianherrick@email.msn.com'), ('', 'herriceu2@tdprs.state.tx.us'), ('', 'robertherrick@bankunited.com'), ('', 'kristi.demaiolo@enron.com'), ('', 'suresh.raghavan@enron.com'), ('', 'harry.arora@enron.com')]
</code></pre>