<pre><code>url = 'http://www.millercenter.org/president/speeches'
conn = urllib2.urlopen(url)
html = conn.read()
miller_center_soup = BeautifulSoup(html)
links = miller_center_soup.find_all('a')
for tag in links:
link = tag.get('href',None)
if link is not None:
print link
</code></pre>
<p>以下是我的一些输出:</p>
<pre><code>/president/washington/speeches/speech-3939
/president/washington/speeches/speech-3939
/president/washington/speeches/speech-3461
https://www.facebook.com/millercenter
https://twitter.com/miller_center
https://www.flickr.com/photos/miller_center
https://www.youtube.com/user/MCamericanpresident
http://forms.hoosonline.virginia.edu/s/1535/16-uva/index.aspx?sid=1535&gid=16&pgid=9982&cid=17637
mailto:mcpa-webmaster@virginia.edu
</code></pre>
<p>我正试图在网站<code>millercenter.org/president/speeches</code>上通过网络搜集所有总统演讲稿,但很难保存相应的演讲链接,我将从中搜集演讲数据。更明确地说,假设我需要georgewashington的演讲,可以在<code>http://www.millercenter.org/president/washington/speeches/speech-3461</code>访问-我只需要能够访问该url。我正在考虑将所有演讲的url存储在一个列表中,然后编写一个<code>for</code>循环来清除所有数据。你知道吗</p>
<p>如果您对列表理解不满意或不想使用它,您可以创建一个列表并附加到它:</p>
<pre><code>all_links = []
for tag in links:
link = tag.get('href',None)
if link is not None:
all_links.append(link)
</code></pre>