<p>有很多方法可以剥这只猫的皮。下面是一个<code>requests</code>/<code>lxml</code>解决方案,它不包含(显式)<code>for</code>循环:</p>
<pre><code>import requests
from lxml.html import fromstring
req = requests.get('http://www.openquestions.com')
resp = fromstring(req.content)
hrefs = resp.xpath('//dt/a/@href')
print(hrefs)
</code></pre>
<p><strong>编辑</p>
<p>我为什么这样写:</p>
<ul>
<li>我更喜欢XPath而不是CSS选择器</li>
<li>很快</li>
</ul>
<p>基准:</p>
<pre><code>import requests,bs4
from lxml.html import fromstring
import timeit
req = requests.get('http://www.openquestions.com').content
def myfunc() :
resp = fromstring(req)
hrefs = resp.xpath('//dl/dt/a/@href')
print("Time for lxml: ", timeit.timeit(myfunc, number=100))
##############################################################
resp2 = requests.get('http://www.openquestions.com').content
def func2() :
soup = bs4.BeautifulSoup(resp2, 'html.parser')
hrefs = [a['href'] for a in soup.select('dl dt a')]
print("Time for beautiful soup:", timeit.timeit(func2, number=100))
</code></pre>
<p>输出:</p>
<pre><code>('Time for lxml: ', 0.09621267095780464)
('Time for beautiful soup:', 0.8594218329542824)
</code></pre>