擅长:python、mysql、java
<p><a href="https://stackoverflow.com/users/1222951/rawing">Rawing</a>是正确的,但是当我面对一个<a href="https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem">XY problem</a>时,我更喜欢提供完成<code>X</code>的最佳方法,而不是修复<code>Y</code>的方法。您应该使用类似<a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/" rel="nofollow noreferrer">^{<cd3>}</a>的HTML解析器来解析网页:</p>
<pre><code>from bs4 import BeautifulSoup
import urllib2
def print_all_links(page):
html = urllib2.urlopen(page).read()
soup = BeautifulSoup(html)
for a in soup.find_all('a', 'title may-blank ', href=True):
print(a['href'])
</code></pre>
<p>如果您真的对HTML解析器过敏,至少使用regex(即使您应该坚持使用HTML解析):</p>
<pre><code>import urllib2
import re
def print_all_links(page):
html = urllib2.urlopen(page).read()
for href in re.findall(r'<a class="title may-blank " href="(.*?)"', html):
print(href)
</code></pre>