Python Selenium获取所有“href”属性

2024-10-02 12:26:57 发布

您现在位置:Python中文网/ 问答频道 /正文

如何获取this page上此“h2”标题的所有“href”属性?在

<h2 class="entry-title">
<a href="http://www.allitebooks.com/deep-learning-with-python-2/" rel="bookmark">Deep Learning with Python</a>
</h2>

我尝试的方法没有得到href,是:

^{pr2}$

这没有得到“a”标签的链接。如果我在“a”标记上查找所有元素,它将返回页面上的每个href(这不是我想要的)。我只想返回上述标题,但能够得到他们的url“href”属性。在


Tags: comhttp标题属性titlewwwwithpage
2条回答

以下代码从所有页面获取所有书籍:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()
baseUrl = "http://www.allitebooks.com/page/1/?s=python"
driver.get(baseUrl)

# wait = WebDriverWait(driver, 5)
# wait.until(EC.visibility_of_element_located((By.CSS_SELECTOR, ".search-result-list li")))

# Get last page number
lastPage = int(driver.find_element(By.CSS_SELECTOR, ".pagination a:last-child").text)

# Get all HREFs for the first page and save them in hrefs list
js = 'return [...document.querySelectorAll(".entry-title a")].map(e=>e.href)'
hrefs = driver.execute_script(js)

# Iterate throw all pages and get all HREFs of books
for i in range(2, lastPage):
    driver.get("http://www.allitebooks.com/page/" + str(i) + "/?s=python")
    hrefs.extend(driver.execute_script(js))

for href in hrefs:
    print(href)

硒对你所需要的可能有点过头了,好的老美容师也能做到这一点。在

import urllib.request, bs4
body = urllib.request.urlopen(urllib.request.Request("http://www.allitebooks.com/page/1/?s=python", headers={"User-Agent": "Mozilla"})).read().decode("utf-8")
soup = bs4.BeautifulSoup(body)
for element in soup.find_all("h2", class_="entry-title"):
    for link in element.find_all("a"):
        print(link.get("href"))

相关问题 更多 >

    热门问题