我正在努力刮一个电子商务页面。。。 当我尝试使用selenium来刮取标题时,我只得到一个输出(您也可以提供使用BS4刮取标题的替代方法)
我的代码
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd
from bs4 import BeautifulSoup
import requests
PATH = "C:\Program Files (x86)\chromedriver.exe"
SRC = requests.get("https://egypt.souq.com").text
soup = BeautifulSoup(SRC, 'lxml')
driver = webdriver.Chrome(PATH)
driver.get("https://egypt.souq.com")
dotd = "/html/body/div[2]/div/main/div[1]/div[1]/div/div[1]/a/img"
driver.find_element_by_xpath(dotd).click()
def get_deals():
title_xpath = "/html/body/div[1]/div/main/div/div[4]/div[3]/div[2]/div[1]/div[1]/div/div[2]/ul/li[1]/h6/span/a"
titles = driver.find_elements_by_xpath(title_xpath)
for title in titles:
print(title.text)
get_deals()
print("successful")
我想刮的部分
<div class="columns small-8 medium-12">
<ul class="body no-bullet">
<li class="title-row">
<h6 class="title">
<span class="itemTitle">
<a href="https://egypt.souq.com/eg-en/samsung-galaxy-m11-dual-sim-32gb-3gb-ram-4g-lte-metallic-blue-85271900033/u/" title="Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue">
Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue
</a>
</span>
</h6>
</li>
<li class="coupon-flag-row">
</li>
<li>
我的输出
Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue
successful
我正在刮的那一页
请帮忙
您可以这样做:
要运行此代码,您必须安装
lxml
,您可以在cmd
中键入pip install lxml
要从网页中获取所有标题,您需要诱导
WebDriverWait
()并等待visibility_of_all_elements_located
()和后面的css
选择器您需要导入以下库
控制台输出:
如果您想使用请求模块,请尝试此代码,您将获得相同的输出
相关问题 更多 >
编程相关推荐