我这里出错了,我没能抓取数据
基本url=https://www.mobilephonesdirect.co.uk/brands/apple?monthly_cost=40
产品url=https://www.mobilephonesdirect.co.uk/handset/apple/iphone-12
我想从所有产品链接中获取内存详细信息
from selenium import webdriver
from bs4 import BeautifulSoup
import xlwt
import time
driver = webdriver.Chrome()
driver.get('https://www.mobilephonesdirect.co.uk/brands')
time.sleep(5)
cookies = driver.find_element_by_xpath("//button[contains(text(),'Accept')]")
time.sleep(5)
cookies.click()
time.sleep(5)
print("cookies accepted")
time.sleep(5)
driver.maximize_window()
print("window maximized")
click = driver.find_element_by_css_selector('.u-grid--3--bp-medium:nth-child(1) .u-ai--center').click()
time.sleep(5)
print("clicked apple phones")
time.sleep(5)
#creating soup obj for the products
content = driver.page_source
soup = BeautifulSoup(content,'html.parser')
#print(soup.prettify())
#creating obj for apple product link
print(driver.current_url)
links = soup.find_all('div',{'class':'o-flex-container u-px--xsmall u-pt--xsmall'})
list_links = []
for link in links:
anchor = link.find('a')
url = 'https://www.mobilephonesdirect.co.uk' + anchor["href"]
list_links.append(url)
for urls in list_links:
driver.get(urls)
#print(soup1.prettify())
print(driver.current_url)
source = driver.page_source
soup1 = BeautifulSoup(source,'html.parser')
product_memory = soup1.find('div',{'class':'u-fz--title-small u-fw--400'})
print(product_memory.text)
问题是它在一个地方运行有点太快了
在这一行之后:
把这个
然后它就会正常工作
我对这些库不太熟悉,但我认为发生的事情是
driver.get(urls)
行告诉webdriver加载该页面,但下一行source = driver.page_source
立即运行,因此页面尚未加载。所以还没有源代码,因为页面还没有完成加载。暂停会为页面加载提供足够的时间相关问题 更多 >
编程相关推荐