<p>该网站有一些额外的要求,使适当的要求。首先它使用cookies,所以需要一个<a href="https://docs.python.org/3/library/http.cookiejar.html?highlight=cookiejar#module-http.cookiejar" rel="nofollow noreferrer">^{<cd1>}</a>。这可以通过首先请求主页而不进行搜索来加载。然后,这也为您提供了提交请求表单时所需的<code>_csrf</code>值。最后,通过使用<a href="https://docs.python.org/3/library/urllib.parse.html?highlight=urlencode#urllib.parse.urlencode" rel="nofollow noreferrer">^{<cd3>}</a>正确构建<code>q</code>,可以从输入搜索生成POST请求:</p>
<pre><code>from operator import itemgetter
from bs4 import BeautifulSoup
import http.cookiejar
import urllib.request
import urllib.parse
song = input('Enter song: ')
cookie_jar = http.cookiejar.CookieJar()
cookie_processor = urllib.request.HTTPCookieProcessor(cookie_jar)
opener = urllib.request.build_opener(cookie_processor)
with opener.open('https://songbpm.com') as response:
html_1 = response.read().decode('utf-8')
soup_1 = BeautifulSoup(html_1, 'html.parser')
data = urllib.parse.urlencode({'q' : song, '_csrf' : soup_1.input['value']}).encode('ascii')
with opener.open('https://songbpm.com/searches', data) as response:
html_2 = response.read().decode('utf-8')
soup_2 = BeautifulSoup(html_2, 'html.parser')
for a in soup_2.find_all('a', {'class' : 'media'}):
print(', '.join(itemgetter(0, 1, 4)([p.get_text(strip=True) for p in a.find_all('p')])))
</code></pre>
<p>这将给你以下结果:</p>
<pre class="lang-none prettyprint-override"><code>Enter song: clean bandit - solo
Clean Bandit, Solo (feat. Demi Lovato), 105
Clean Bandit, Solo (feat. Demi Lovato) - Acoustic, 0
Clean Bandit, Solo (feat. Demi Lovato) - Ofenbach Remix, 121
Clean Bandit, Solo (feat. Demi Lovato) - Sofi Tukker Remix, 127
Clean Bandit, Solo (feat. Demi Lovato) - Wideboys Remix, 122
</code></pre>
<p>使用<code>beautifulsoup</code>可以很容易地提取所有细节。<code>itemgetter()</code>只是从给定列表中快速获取某些项的一种方法。你知道吗</p>