刮削时返回“无”

2024-09-23 08:29:08 发布

您现在位置:Python中文网/ 问答频道 /正文

我的代码当前在刮取期间创建以下输出:https://pastebin.com/pUcCdbMn。你知道吗

我想得到listing-title中的文本,即

<h2 class="listing-title"><a class="listing-fpa-link" href="...">Vauxhall Astra 1.6i 16V Design 5dr Hatchback</a></h2>

returnVauxhall Astra 1.6i 16V Design 5dr掀背车

listing-key-specs,即

<ul class="listing-key-specs">
<li>2015
(65 reg)</li>
<li>Hatchback</li>
<li>14,304 miles</li>
<li>Manual</li>
<li>1.6L</li>
<li>Petrol</li>
</ul>

return2015(65 reg),掀背式,“14304英里”,手动,1.6升,汽油机全部作为单独变量。你知道吗

我怎样才能做到这一点?当我试图提取列表标题时,我的代码当前返回None

for page in range(1, 3):
    page_count = str(page)
    if page is 1:
        url = "http://www.autotrader.co.uk/car-search?sort=sponsored&radius=1500&postcode=se218qe&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New"
    else:
        url = "http://www.autotrader.co.uk/car-search?sort=sponsored&radius=1500&postcode=se218qe&onesearchad=Used&onesearchad=Nearly%20New&onesearchad=New&page=" + page_count
    browser.get(url)
    soup = BeautifulSoup(browser.page_source, "html.parser")
    cars = soup.find_all('li', {'class': 'search-page__result'})
    cars_count = len(cars)
    print 'Processing ' + str(cars_count) + ' cars found on page ' + page_count

    # Loop through cars on page
    for car in cars:
        car_name = car.find('h2 ', {'class': 'listing-title'})
        print car_name

Tags: 代码urlsearchtitlecountpagelih2
1条回答
网友
1楼 · 发布于 2024-09-23 08:29:08

在标记名后面有一个额外的空格:

car_name = car.find('h2 ', {'class': 'listing-title'})
                 # HERE^

取下它,它就可以正常工作了。你知道吗

请注意,要获取标题的文本,请使用get_text()方法:

print(car_name.get_text(strip=True))

也可以将.find()替换为.select_one()

car_name = car.find('h2.listing-title')

我还将使脚本更加可靠,并explicitly wait在读取页面源代码并将其传递给进一步解析之前显示搜索结果:

from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.wait import WebDriverWait

# ...
browser.get(url)

wait = WebDriverWait(browser, 10)
wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, ".search-page__result .listing-title")))

soup = BeautifulSoup(browser.page_source, "html.parser")

相关问题 更多 >