beauthulsoup select all href在某个具有特定类的元素中

网友

1楼 · 编辑于 2024-09-27 21:27:35

不确定以上答案是否起作用。这是一个为我做工作的。在

url = "SOME-URL-YOU-WANT-TO-SCRAPE"
response = requests.get(url=url)
urls = BeautifulSoup(response.content, 'lxml').find_all('a', attrs={"class": ["YOUR-CLASS-NAME"]}, href=True)

网友

2楼 · 编辑于 2024-09-27 21:27:35

您可以通过类名获取href:

问题1：

for link in soup.findAll('a', {'class': 'emblem'}):
   try:
      print link['href']
   except KeyError:
      pass`

网友

3楼 · 编辑于 2024-09-27 21:27:35

试试这个。它将给你所有的网址遍历所有网页在该网站。我使用了Explicit Wait使它更快、更动态。在

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup

driver = webdriver.Chrome()
url = "http://emblematica.grainger.illinois.edu/"
wait = WebDriverWait(driver, 10)
driver.get("http://emblematica.grainger.illinois.edu/browse/emblems?Filter.Collection=Utrecht&Skip=0&Take=18")
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".emblem")))

while True:
    soup = BeautifulSoup(driver.page_source,"lxml")
    for item in soup.select('.emblem'):
        links = url + item['href']
        print(links)

    try:
        link = driver.find_element_by_id("next")
        link.click()
        wait.until(EC.staleness_of(link))
    except Exception:
        break
driver.quit()

部分输出：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

beauthulsoup select all href在某个具有特定类的元素中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >