从Python下载Python的mp3链接

from BeautifulSoup import BeautifulSoup from bs4 import BeautifulSoup import urllib2 #http://www.yt-mp3.com/watch?v=cXAxpoC8o9w url = "http://www.yt-mp3.com/watch?v="+"cXAxpoC8o9w"#YT video ID hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11', 'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8', 'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3', 'Accept-Encoding': 'none', 'Accept-Language': 'en-US,en;q=0.8', 'Connection': 'keep-alive'} req = urllib2.Request(url,headers=hdr) website = urllib2.urlopen(req) html = website.read() soup = BeautifulSoup(html) links = soup.find_all('a') for tag in links: link = tag.get('href',None) if link is not None: print link

1条回答

网友

1楼 · 发布于 2024-10-04 01:26:30

这个网站的设计是为了让你很难提取你想要的链接文本。因此，使用urllib2或request是没有帮助的。在

为了解决这个问题，您需要使用类似selenium的方法来自动化webbrowser。在这种情况下，您需要自动将鼠标悬停在下载按钮上。正是这个动作让人们看到了链接。在

具体做法如下：

from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
import time

browser = webdriver.Firefox()
browser.get("http://www.yt-mp3.com/watch?v=cXAxpoC8o9w")
time.sleep(6)
download = browser.find_element_by_class_name('download')
ActionChains(browser).move_to_element(download).perform()
print "MP3 link is", download.get_attribute("href")

进一步的改进可以去除sleep()。在

这将显示如下内容：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章