如何循环浏览多个页面并同时打开链接

2024-10-01 04:47:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我目前正试图弄清楚如何在健身班网站上浏览一组工作室

在这个网站的搜索结果页面上,每个页面列出了50个工作室,大约有26个页面https://classpass.com/search如果你想看一看

我的代码解析搜索结果页面,selenium获取页面上每个studio的链接(在我的完整代码中,selenium打开链接并在页面上刮取数据)

在循环浏览了第1页上的所有结果之后,我想单击下一页按钮,并在结果第2页上重复。我得到错误 Message: no such element: Unable to locate element:,但我知道元素肯定在结果页面上,可以单击。我用一个简化的脚本对此进行了测试以确认

我可能做错了什么?我试过很多建议,但到目前为止没有一个有效

from selenium import webdriver
from bs4 import BeautifulSoup as soup
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as browser_wait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
import time
import re
import csv

# initialize the chrome browser
browser = webdriver.Chrome(executable_path=r'./chromedriver')

# URL
class_pass_url = 'https://www.classpass.com'

# Create file and writes the first row, added encoding type as write was giving errors
#f = open('ClassPass.csv', 'w', encoding='utf-8')
#headers = 'URL, Studio, Class Name, Description, Image, Address, Phone, Website, instagram, facebook, twitter\n'
#f.write(headers)

# classpass results page
page = "https://classpass.com/search"

browser.get(page)

# Browser waits

browser_wait(browser, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, "line")))

# Scrolls to bottom of page to reveal all classes
# browser.execute_script("window.scrollTo(0, document.body.scrollHeight);")

# Extract page source and parse
search_source = browser.page_source
search_soup = soup(search_source, "html.parser")

pageCounter = 0
maxpagecount = 27

# Looks through results and gets link to class page
studios = search_soup.findAll('li', {'class': '_3vk1F9nlSJQIGcIG420bsK'})

while (pageCounter < maxpagecount):

    search_source = browser.page_source
    search_soup = soup(search_source, "html.parser")
    studios = search_soup.findAll('li', {'class': '_3vk1F9nlSJQIGcIG420bsK'})

    for studio in studios:

        studio_link = class_pass_url + studio.a['href']
        browser.get(studio_link)

        browser_wait(browser, 10).until(EC.visibility_of_element_located((By.CLASS_NAME, "line")))

        

    element = browser.find_element_by_xpath('//*[@id="Search_Results"]/div[1]/div/div/nav/button[2]')
    browser.execute_script("arguments[0].click();", element)

Tags: tofromimportbrowsersourcesearchseleniumpage
1条回答
网友
1楼 · 发布于 2024-10-01 04:47:33

在找到“下一页”按钮之前,必须返回主页面。您可以通过替换以下代码来解决此问题。此代码最初将收集页面的所有studio url

studios = search_soup.findAll('li', {'class': '_3vk1F9nlSJQIGcIG420bsK'})

studios = []
for page in range(num_pages):
    studios.append(search_soup.findAll('li', {'class': '_3vk1F9nlSJQIGcIG420bsK'}))

    element = browser.find_element_by_xpath('//*[@id="Search_Results"]/div[1]/div/div/nav/button[2]')
    browser.execute_script("arguments[0].click();", element)

然后单击下一页按钮元素删除代码

相关问题 更多 >