Selenium只打印一个输出

2024-06-25 23:17:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在努力刮一个电子商务页面。。。 当我尝试使用selenium来刮取标题时,我只得到一个输出(您也可以提供使用BS4刮取标题的替代方法)

我的代码

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd  
from bs4 import BeautifulSoup
import requests

PATH = "C:\Program Files (x86)\chromedriver.exe"
SRC = requests.get("https://egypt.souq.com").text
soup = BeautifulSoup(SRC, 'lxml')
driver = webdriver.Chrome(PATH)
driver.get("https://egypt.souq.com")

dotd = "/html/body/div[2]/div/main/div[1]/div[1]/div/div[1]/a/img"

driver.find_element_by_xpath(dotd).click()

def get_deals():
    title_xpath = "/html/body/div[1]/div/main/div/div[4]/div[3]/div[2]/div[1]/div[1]/div/div[2]/ul/li[1]/h6/span/a"
    titles = driver.find_elements_by_xpath(title_xpath)
    for title in titles:
        print(title.text)

get_deals()
print("successful")

我想刮的部分

<div class="columns small-8 medium-12">
    <ul class="body no-bullet">
        <li class="title-row">
            <h6 class="title">
                <span  class="itemTitle">
                    <a href="https://egypt.souq.com/eg-en/samsung-galaxy-m11-dual-sim-32gb-3gb-ram-4g-lte-metallic-blue-85271900033/u/" title="Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue">
                        Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue
                    </a>
                </span>
            </h6>
        </li>
        <li class="coupon-flag-row">
        </li>

        <li>

我的输出

Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue

successful

我正在刮的那一页

https://deals.souq.com/eg-en/?utm_source=souq

请帮忙


Tags: fromhttpsimportdivcomgettitledriver
2条回答

您可以这样做:

from bs4 import BeautifulSoup
import requests

response = requests.get(URL)
response = respnose.text
soup = BeautifulSoup(response, "lxml")

all_titles = soup.findAll("span", class_ = "itemTitle")
for title in all_titles:
    title = title.find("a")
    title = title.get("title")
    print(title)

要运行此代码,您必须安装lxml,您可以在cmd中键入pip install lxml

要从网页中获取所有标题,您需要诱导WebDriverWait()并等待visibility_of_all_elements_located()和后面的css选择器

titles = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h6.title>span.itemTitle>a")))
for title in titles:
    print(title.text)

您需要导入以下库

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

控制台输出:

Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue
Electrostar HW50101 Electric Water Heater -50 Liter, White
PANTENE Anti Hair Fall Shampoo, 400 ml with Anti Hair Fall Oil Replacement, 180 ml and 3 Minute Miracle Daily Care Conditioner and Mask, 200 ml
SHARP SJ-GV63G-RD Inverter Refrigerator with Hoover DXOA38AC3R-ELA Washing Machine, La Germania 9M10Gub1X4Aww Cooker, Toshiba 4K Smart 55 Inch TV - 55U5965EA, TOSHIBA VC-EA1800SE Vacuum Cleaner, Tornado FP-1000SG Food Processor, Tornado TCM-11415-B Espresso Machine and Tornado EFS-360/903G Stand Fan - 16 Inch
Panasonic ER217 Hair and Beard Trimmer Wet & Dry
PANTENE Smooth and Silky Shampoo, 400 ml with Smooth and Silky Oil Replacement, 180 ml and 3 Minute Miracle Smooth and Silky Conditioner and Mask, 200 ml
Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Black
Apple iPhone 11 Pro Max with FaceTime - 256GB, 4GB RAM, 4G LTE, Midnight Green, Dual SIM
Sharp SJ-BG615-SS Advanced No Frost Digital Refrigerator with Bottom Freezer and Two Doors, 468 Liters - Silver with SHARP R-20CR(S) Microwave, 20 Liters, 800 Watt - Silver
Apple iPad 2019 7th Gen - 10.2 inch Retina Display, Wi-Fi, 32GB, Gold
Pampers Sensitive Protect, 56 Wipes
Hoover DXOA38AC3R-ELA Front Loading Full Automatic Washing Machine, 8 Kg with Tornado TST-2200 Steam Iron, 2200 Watt
Gillette Fusion ProGlide Power Styler Razor
ATA 32 Inch HD LED Standard TV Black - 32DN4 LE
Apple iPhone SE - 128GB , 3GB RAM, 4G LTE, White - Single SIM and E-SIM
Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Violet
Pampers Fresh Clean, 64 Wipes
Mintra Plastic Round Pot, 11cm- Black
LG F4R5VYG2E Vivace LED Display Steel Washing Machine, 9 kg - Black
Casio MTP-V001L-7BUDF Analog Leather Dress Watch for Men - Black, Quartz
Oral-B Gum and Enamel Care Ultrathin Extra Soft Toothbrush, 2 Pieces -Multi Color
Apple Iphone XS Max With Facetime - 64 GB, 4G LTE, Gold, 4 GB Ram, Single Sim & E-Sim
LG F4R5VGG2E Steam Washing Machine with Dryer, 9 Kilograms - Black Steel
Pampers Pants Diapers, Size 5, Junior, 12-18 kg, 52 Count
Toshiba GR-EF51GZ-XK Refrigerator with HOOVER DXOA38AC3R-ELA Full Automatic Washing Machine with La Germania 9M10G4A1X4AWW Cooker with Tornado 43EL8250E-B Shield 43 Inch TV with TOSHIBA VC-EA1600SE Vacuum Cleaner with Tornado MOM-C25BBE-S Microwave with Grill and Tornado EFS-360/90R Stand Fan
Braun Face Extra Sensitive Replacement Brush Refill , Duo Pack , 80-s Face
Apple iPhone SE - 64GB, 3GB RAM, 4G LTE, Red - Single SIM and E-SIM
Off Cliff Raglan Sleeves Top with Elastic-Waist Shorts Cotton Pajama Set for Men - Heather Grey & Heather White
Sharp SJ-58C(CH) Refrigerator with HOOVER DXOA38AC3R-ELA Full Automatic Washing Machine with La Germania 9M10Gub1X4Aww Cooker with Tornado 43EL8250E-B Shield TV with TOSHIBA VC-EA1600SE Vacuum Cleaner and Tornado EFS-360/90R Stand Fan
Nilco Tottery Tower Wooden Blocks

如果您想使用请求模块,请尝试此代码,您将获得相同的输出

import requests
from bs4 import BeautifulSoup

res=requests.get("https://deals.souq.com/eg-en/?utm_source=souq")
soup=BeautifulSoup(res.text,"html.parser")
for item in soup.select('.title>.itemTitle>a'):
    print(item.text.strip())

相关问题 更多 >