我正试图从这个链接中获取产品的颜色和型号

2024-09-28 19:30:46 发布

您现在位置:Python中文网/ 问答频道 /正文

我这里出错了,我没能抓取数据

基本url=https://www.mobilephonesdirect.co.uk/brands/apple?monthly_cost=40

产品url=https://www.mobilephonesdirect.co.uk/handset/apple/iphone-12

我想从所有产品链接中获取内存详细信息

enter image description here

from selenium import webdriver
from bs4 import BeautifulSoup
import xlwt
import time

driver = webdriver.Chrome()
driver.get('https://www.mobilephonesdirect.co.uk/brands')
time.sleep(5)
cookies = driver.find_element_by_xpath("//button[contains(text(),'Accept')]")
time.sleep(5)
cookies.click()
time.sleep(5)
print("cookies accepted")
time.sleep(5)
driver.maximize_window()
print("window maximized")
click = driver.find_element_by_css_selector('.u-grid--3--bp-medium:nth-child(1) .u-ai--center').click()
time.sleep(5)
print("clicked apple phones")
time.sleep(5)
#creating soup obj for the products
content = driver.page_source
soup = BeautifulSoup(content,'html.parser')
#print(soup.prettify())
#creating obj for apple product link
print(driver.current_url)
links = soup.find_all('div',{'class':'o-flex-container u-px--xsmall u-pt--xsmall'})
list_links = []
for link in links:
    anchor = link.find('a')
    url = 'https://www.mobilephonesdirect.co.uk' + anchor["href"]
    list_links.append(url)
for urls in list_links:
    driver.get(urls)
    #print(soup1.prettify())
    print(driver.current_url)
    source = driver.page_source
    soup1 = BeautifulSoup(source,'html.parser')

    product_memory = soup1.find('div',{'class':'u-fz--title-small u-fw--400'})
    print(product_memory.text)
    

Tags: httpsimporturlappletimewwwdriversleep
1条回答
网友
1楼 · 发布于 2024-09-28 19:30:46

问题是它在一个地方运行有点太快了

在这一行之后:

driver.get(urls)

把这个

time.sleep(5)

然后它就会正常工作

我对这些库不太熟悉,但我认为发生的事情是driver.get(urls)行告诉webdriver加载该页面,但下一行source = driver.page_source立即运行,因此页面尚未加载。所以还没有源代码,因为页面还没有完成加载。暂停会为页面加载提供足够的时间

相关问题 更多 >