网址不工作的代码，但手动搜索工作问题的回答

网址不工作的代码，但手动搜索工作

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

该网站有一些额外的要求，使适当的要求。首先它使用cookies，所以需要一个<a href="https://docs.python.org/3/library/http.cookiejar.html?highlight=cookiejar#module-http.cookiejar" rel="nofollow noreferrer">^{<cd1>}</a>。这可以通过首先请求主页而不进行搜索来加载。然后，这也为您提供了提交请求表单时所需的<code>_csrf</code>值。最后，通过使用<a href="https://docs.python.org/3/library/urllib.parse.html?highlight=urlencode#urllib.parse.urlencode" rel="nofollow noreferrer">^{<cd3>}</a>正确构建<code>q</code>，可以从输入搜索生成POST请求： <pre><code>from operator import itemgetter from bs4 import BeautifulSoup import http.cookiejar import urllib.request import urllib.parse song = input('Enter song: ') cookie_jar = http.cookiejar.CookieJar() cookie_processor = urllib.request.HTTPCookieProcessor(cookie_jar) opener = urllib.request.build_opener(cookie_processor) with opener.open('https://songbpm.com') as response: html_1 = response.read().decode('utf-8') soup_1 = BeautifulSoup(html_1, 'html.parser') data = urllib.parse.urlencode({'q' : song, '_csrf' : soup_1.input['value']}).encode('ascii') with opener.open('https://songbpm.com/searches', data) as response: html_2 = response.read().decode('utf-8') soup_2 = BeautifulSoup(html_2, 'html.parser') for a in soup_2.find_all('a', {'class' : 'media'}): print(', '.join(itemgetter(0, 1, 4)([p.get_text(strip=True) for p in a.find_all('p')]))) </code></pre> 这将给你以下结果： <pre class="lang-none prettyprint-override"><code>Enter song: clean bandit - solo Clean Bandit, Solo (feat. Demi Lovato), 105 Clean Bandit, Solo (feat. Demi Lovato) - Acoustic, 0 Clean Bandit, Solo (feat. Demi Lovato) - Ofenbach Remix, 121 Clean Bandit, Solo (feat. Demi Lovato) - Sofi Tukker Remix, 127 Clean Bandit, Solo (feat. Demi Lovato) - Wideboys Remix, 122 </code></pre> 使用<code>beautifulsoup</code>可以很容易地提取所有细节。<code>itemgetter()</code>只是从给定列表中快速获取某些项的一种方法。你知道吗

网址不工作的代码，但手动搜索工作

1 个回答

相关Python问题