在beautifulsoup中,如何收集解析器中未出现的照片链接?

2024-06-25 22:35:42 发布

您现在位置:Python中文网/ 问答频道 /正文

在Python3中,我想从某些页面上的照片中获取链接,例如:

http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/AC/10000600209

http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809

我这么做了:

from urllib.request import urlopen
from bs4 import BeautifulSoup
import urllib.request, urllib.parse, urllib.error

html = urlopen('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/AC/10000600209')
soup = BeautifulSoup(html, "html.parser")
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})
print(link)
None

html = urlopen('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')
soup = BeautifulSoup(html, "html.parser")
link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})
print(link)
None

我打算收集照片旁边的一组项目,并定义另一个策略来获得src的确切点。作为:http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/AC/2022802018/10000600209/foto_1532971768767.jpg

但是Firefox浏览器的Inspect元素(img class='img-thumbnail img responsive dvg cand foto')中出现的内容与它收集的html.parser不一样

请问,有人知道我如何收集这个网站上的照片链接吗

-/-

使用硒:

from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select
from bs4 import BeautifulSoup

profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)
browser.implicitly_wait(10)

browser.get('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()

link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']

print(link)
http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/SP/2022802018/250000627809/foto_1534447872273.jpg

Tags: frombrimporthttpimghtmllinksoup
1条回答
网友
1楼 · 发布于 2024-06-25 22:35:42
from selenium import webdriver
from selenium.common.exceptions import NoAlertPresentException
from selenium.webdriver.support.select import Select
from bs4 import BeautifulSoup

profile = webdriver.FirefoxProfile()
browser = webdriver.Firefox(profile)
browser.implicitly_wait(10)

browser.get('http://divulgacandcontas.tse.jus.br/divulga/#/candidato/2018/2022802018/SP/250000627809')

html = browser.page_source
soup = BeautifulSoup(html, "html.parser")
browser.close()

link = soup.find("img", {"class": "img-thumbnail img-responsive dvg-cand-foto"})['src']

print(link)
http://divulgacandcontas.tse.jus.br/candidaturas/oficial/2018/BR/SP/2022802018/250000627809/foto_1534447872273.jpg

相关问题 更多 >