网址不工作的代码，但手动搜索工作

import urllib.request import urllib.parse song = input("") fin = "" for i in song: if i == "(": tempone = song song = tempone.split("(")[0] + tempone.split(") ")[1] previous = "" for i in song: if i.isalpha(): temp = fin fin = temp + i else: if previous.isalpha(): temp = fin fin = temp + "-" previous = i songencoded = urllib.parse.quote(song, safe='') print('https://songbpm.com/'+ fin.lower() + '?q=' + songencoded) response = urllib.request.urlopen('https://songbpm.com/'+ fin.lower() + '?q=' + songencoded) text = str(response.read()).split('\\n')

2条回答

网友

1楼 · 编辑于 2024-09-28 03:19:10

好吧，我不知道是什么样的魔力在驱动这个网站，但你可以使用无头浏览器，而不是在url中寻找歌曲，你可以在搜索框中键入你正在寻找的歌曲的名称，它会起作用！对不起，我没答对你的问题

这里有一个100%的工作代码：）玩得开心

import bs4
import time
from selenium import webdriver
from selenium.webdriver.common.keys import Keys

options =webdriver.ChromeOptions()
options.add_argument('headless') 
driver = webdriver.Chrome(chrome_options = options)
url = ('https://songbpm.com/')

while True:
    driver.get(url)
    inputElement = driver.find_element_by_id("search-field")
    inputElement.send_keys(str(input("Enter name of a song: \n>")))
    inputElement.send_keys(Keys.ENTER)
    html = driver.page_source
    soup= bs4.BeautifulSoup(html, "html.parser")

    for node in soup.findAll("a", {"class": "media"}):
        print("ARTIST:",node.find("p", {"class":"artist-name"}).text.strip())
        print("SONG:",node.find("p", {"class": "track-name"}).text.strip())
        print("*"*20)

网友

2楼 · 编辑于 2024-09-28 03:19:10

该网站有一些额外的要求，使适当的要求。首先它使用cookies，所以需要一个^{}。这可以通过首先请求主页而不进行搜索来加载。然后，这也为您提供了提交请求表单时所需的_csrf值。最后，通过使用^{}正确构建q，可以从输入搜索生成POST请求：

from operator import itemgetter
from bs4 import BeautifulSoup
import http.cookiejar
import urllib.request
import urllib.parse


song = input('Enter song: ')

cookie_jar = http.cookiejar.CookieJar()
cookie_processor = urllib.request.HTTPCookieProcessor(cookie_jar)
opener = urllib.request.build_opener(cookie_processor)

with opener.open('https://songbpm.com') as response:
    html_1 = response.read().decode('utf-8')

soup_1 = BeautifulSoup(html_1, 'html.parser')    
data = urllib.parse.urlencode({'q' : song, '_csrf' : soup_1.input['value']}).encode('ascii')

with opener.open('https://songbpm.com/searches', data) as response:
    html_2 = response.read().decode('utf-8')

soup_2 = BeautifulSoup(html_2, 'html.parser')

for a in soup_2.find_all('a', {'class' : 'media'}):
    print(', '.join(itemgetter(0, 1, 4)([p.get_text(strip=True) for p in a.find_all('p')])))

这将给你以下结果：

Enter song: clean bandit - solo
Clean Bandit, Solo (feat. Demi Lovato), 105
Clean Bandit, Solo (feat. Demi Lovato) - Acoustic, 0
Clean Bandit, Solo (feat. Demi Lovato) - Ofenbach Remix, 121
Clean Bandit, Solo (feat. Demi Lovato) - Sofi Tukker Remix, 127
Clean Bandit, Solo (feat. Demi Lovato) - Wideboys Remix, 122

使用beautifulsoup可以很容易地提取所有细节。itemgetter()只是从给定列表中快速获取某些项的一种方法。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章