在Python3中,我想从某些页面上的照片中获取链接,例如:
http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/AC/10000600209
http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809
我这么做了:
from urllib.request import urlopen
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error
html = urlopen('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/AC/10000600209')
soup = BeautifulSoup(html, "html.parser")
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})
print(link)
None
html = urlopen('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')
soup = BeautifulSoup(html, "html.parser")
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})
print(link)
None
我打算收集照片旁边的一组项目,并定义另一个策略来获得src的确切点。作为:http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/AC/2022802018/10000600209/foto_1532971768767.jpg
但是Firefox浏览器的Inspect元素(img class='img-thumbnail img responsive dvg cand foto')中出现的内容与它收集的html.parser不一样
请问,有人知道我如何收集这个网站上的照片链接吗
-/-
使用硒:
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select
from bs4 import BeautifulSoup
profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)
browser.implicitly_wait(10)
browser.get('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')
html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']
print(link)
http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/SP/2022802018/250000627809/foto_1534447872273.jpg
相关问题 更多 >
编程相关推荐