如何使用seleniumwebdriver从多个页面获取信息?

2024-06-16 03:53:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正试图从BunHAMS网站上提供的“香港手表2”拍卖的所有拍卖场(第1页到第33页)获得标题(https://www.bonhams.com/auctions/25281/?category=results#/!)。我对使用python和selenium还很陌生,但是我尝试使用下面的代码获得结果。这段代码给出了我想要的结果,但只适用于第1页。然后,代码不断重复第1页的结果。似乎单击下一页的循环不起作用。有人能帮我修一下这个环吗?你知道吗

下面是我使用的代码:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions

driver=webdriver.Chrome()
driver.get('https://www.bonhams.com/auctions/25281/?category=results#/!')

while True:
    next_page_btn =driver.find_elements_by_xpath("//*[@id='lots']/div[2]/div[5]/div/a[10]/div")
    if len(next_page_btn) <1:
        print("no more pages left")
        break
    else:
        titles = driver.find_elements_by_xpath("//*[@class='firstLine']")
        titles = [title.text for title in titles]
        print(titles)

    element = WebDriverWait(driver,5).until(expected_conditions.element_to_be_clickable((By.ID,'lots')))
    driver.execute_script("return arguments[0].scrollIntoView();", element)
    element.click()

下面是我得到的输出。Python一次又一次地重复/加载这个输出(我认为它这样做了33次??)。你知道吗

['Hong Kong Watches 2.0', '', 'OMEGA. A Very Fine And Rare Limited Edition 
Yellow Gold Chronograph Bracelet Watch, Commemorating the Apollo 11 Space 
Mission And The Successful Moon Landing in 1969', '', '', '', 'ROLEX. TWO 
SETS OF SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1970s', '', 'ROLEX. 
TWO SETS OF RARE SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1980s', 
'', 'PATEK PHILIPPE. A SET OF THREE RARE LIMOGES PORCELAIN AND ENAMEL 
DISHES', '', 'Bvlgari/MAUBOUSSIN. TWO SETS OF CUFFLINKS', '', 
'BOUCHERON/MONTBLANC. TWO SETS OF CUFFLINKS', '', 'PATEK PHILIPPE. TWO 
SETS OF CUFFLINKS', '', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 
8-Days Power Reserve and Alarm', '', 'Cartier & LeCoultre. A group of 
three gilt brass table clocks (Alarm/Alarm Worldtime/Engraved dial)', '', 
'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve', 
'', 'Reuge. A Gold Plated Musical Automaton Open Face Pocket Watch with 
Alarm', '', 'Imhof. An Attractive Gilt Brass Table Clock With Polychrome 
Enamel Dial', '', 'Vacheron Constantin. A Large Polished Metal Perpetual 
Calendar Wall Clock']
['Hong Kong Watches 2.0', '', 'OMEGA. A Very Fine And Rare Limited Edition 
Yellow Gold Chronograph Bracelet Watch, Commemorating the Apollo 11 Space 
Mission And The Successful Moon Landing in 1969', '', '', '', 'ROLEX. TWO 
SETS OF SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1970s', '', 'ROLEX. 
TWO SETS OF RARE SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1980s', 
'', 'PATEK PHILIPPE. A SET OF THREE RARE LIMOGES PORCELAIN AND ENAMEL 
DISHES', '', 'Bvlgari/MAUBOUSSIN. TWO SETS OF CUFFLINKS', '', 
'BOUCHERON/MONTBLANC. TWO SETS OF CUFFLINKS', '', 'PATEK PHILIPPE. TWO 
SETS OF CUFFLINKS', '', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 
8-Days Power Reserve and Alarm', '', 'Cartier & LeCoultre. A group of 
three gilt brass table clocks (Alarm/Alarm Worldtime/Engraved dial)', '', 
'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve', 
'', 'Reuge. A Gold Plated Musical Automaton Open Face Pocket Watch with 
Alarm', '', 'Imhof. An Attractive Gilt Brass Table Clock With Polychrome 
Enamel Dial', '', 'Vacheron Constantin. A Large Polished Metal Perpetual 
Calendar Wall Clock']

Tags: ofdriverseleniumwithtablesetswebdrivertwo
1条回答
网友
1楼 · 发布于 2024-06-16 03:53:29

不需要selenium库来废弃数据。还可以使用requestsBeautifulSoup库获取所有页面数据。你知道吗

import requests
from bs4 import BeautifulSoup

headers = {
       "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0",
       "Accept": "application/json"
   }

page_num = 1
title_list = []

while True:
    url = 'https://www.bonhams.com/api/v1/lots/25281/?category=results&length=12&minimal=false&page={}'.format(page_num)
    print("===url===",url)
    response = requests.get(url,headers=headers).json()
    max_lot = response['max_lot']
    last_iSaleLotNo = 0
    titles = []
    for lot in response['lots']:
        last_iSaleLotNo = lot['lot_id_combined']
        title = BeautifulSoup(lot['styled_title'], 'lxml').find("div",{'class':'firstLine'}).text.strip()
        titles.append(title)

    title_list.append(titles)
    print("===titles===",titles)
    if int(max_lot) == int(last_iSaleLotNo):
        break

    page_num+=1

print(title_list)

首页o/p:

['ROLEX. TWO SETS OF SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1970s', 'ROLEX. TWO SETS OF RARE SHOWCASE DISPLAYS, MADE FOR ROLEX RETAILERS IN 1980s', 'PATEK PHILIPPE. A SET OF THREE RARE LIMOGES PORCELAIN AND ENAMEL DISHES', 'Bvlgari/MAUBOUSSIN. TWO SETS OF CUFFLINKS', 'BOUCHERON/MONTBLANC. TWO SETS OF CUFFLINKS', 'PATEK PHILIPPE. TWO SETS OF CUFFLINKS', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve and Alarm', 'Cartier & LeCoultre. A group of three gilt brass table clocks (Alarm/Alarm Worldtime/Engraved dial)', 'Jaeger-LeCoultre. A Gilt Brass Table Clock With 8-Days Power Reserve', 'Reuge. A Gold Plated Musical Automaton Open Face Pocket Watch with Alarm', 'Imhof. An Attractive Gilt Brass Table Clock With Polychrome Enamel Dial', 'Vacheron Constantin. A Large Polished Metal Perpetual Calendar Wall Clock']

打开browser network选项卡并单击next按钮,您将看到JSON响应数据,如 enter image description here

相关问题 更多 >