<p>我建议不要在任何现代Python上下文中使用<code>urllib</code>。使用“<a href="http://docs.python-requests.org/en/latest/" rel="nofollow">Requests</a>”(“HTTP for Humans”)代替。在</p>
<p>但在此之前,正如@Skyler所说,结果是重定向,您的第一站应该是查看<code>curl</code>报告的内容:</p>
<pre><code>$ curl -I 'https://tools.usps.com/go/TrackConfirmAction.action?tRef=fullpage&tLc=1&text28777=&tLabels=LN594080445CN\]'
HTTP/1.1 301 Moved Permanently
Server: AkamaiGHost
Content-Length: 0
Location: https://www.usps.com/root/global/server_responses/webtools-msg.htm
Date: Wed, 31 Dec 2014 10:43:14 GMT
Connection: keep-alive
</code></pre>
<p>没什么大不了的,但是你可以看到<a href="https://www.usps.com/root/global/server_responses/webtools-msg.htm" rel="nofollow">URL it redirects to states</a>:</p>
<blockquote>
<p>To learn about integrating the free Postal Service® Address and
Tracking API's into your application, please visit
www.usps.com/webtools.</p>
</blockquote>
<p>也很公平。我建议去那里报名。如果有一个合适的方法,就没有必要抓取HTML。在</p>
<p>但是,如果<em>真的</em>想通过代码获取原始HTML:首先通过Curl让它工作。在</p>
<p>打开Chrome开发工具并重新加载页面。右键单击并查找“复制为卷曲”。您可以编辑链接。以下是我的工作,虽然它可能会被削减更多:</p>
^{pr2}$
<p>这个可以修剪。下面的代码与nice<code>requests</code>模块一起工作:</p>
<pre><code>import requests
headers = {
'Accept-Language': 'en-US,en;q=0.8',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.71 Safari/537.36',
'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
}
r = requests.get('https://tools.usps.com/go/TrackConfirmAction.action?tRef=fullpage&tLc=1&text28777=&tLabels=LN594080445CN]', headers=headers)
print "Status: %s" % r.status_code
print "Content-type: %s" % r.headers['content-type']
print "Content length: %d" % len(r.text)
</code></pre>
<p>运行中:</p>
<pre><code>$ python demo.py
Status: 200
Content-type: text/html
Content length: 55142
</code></pre>
<p>更干净:</p>
<pre><code>params = {
'tRef': 'fullpage',
'tLc': '1',
'text28777': '',
'tLabels': 'LN594080445CN]',
}
r = requests.get('https://tools.usps.com/go/TrackConfirmAction.action',
params=params,
headers=headers)
</code></pre>
<p>正如我所说,我认为这不是正确的选择。使用USPS API。在</p>