如何使用“::before”在selenium中获取隐藏的href标记

2024-10-04 11:24:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从PLP获取url并访问每个元素以从PDP获取特定的关键字并将其转储到json文件中。但是,列表只返回1个数据。我怀疑该网站试图阻止这一行动。我使用这个程序一个月一次,看看有什么新的功能添加到新的项目。在

“***”之间的代码是我遇到问题的部分。它返回正确的值,但只返回1数据。如何我能得到更多的数据吗?在下面的示例中,我只获取产品名称以使其更简单。在

示例url:“https://store.nike.com/us/en_us/pw/mens-running-shoes/7puZ8yzZoi3

enter image description here

实际要素

<div class="exp-product-wall clearfix">
    ::before
    <div class="grid-item fullSize" data-pdpurl="https://www.nike.com/t/epic-react-flyknit-2-mens-running-shoe-459stf" data-column-index="0" data-item-index="1">
                                    <div class="grid-item-box">
                                      <div class="grid-item-content">
                                        <div class="grid-item-image">
                                          <div class="grid-item-image-wrapper sprite-sheet sprite-index-1">
                                            <a href="https://www.nike.com/t/epic-react-flyknit-2-mens-running-shoe-459stf">
                                              <img src="https://images.nike.com/is/image/DotCom/pwp_sheet2?$NIKE_PWPx3$&amp;$img0=BQ8928_001&amp;$img1=BQ8928_003&amp;$img2=BQ8928_005">

低于工作代码

^{pr2}$

json格式的期望输出:

^{3}$

Tags: 数据httpsimagedivcomdataindexitem
1条回答
网友
1楼 · 发布于 2024-10-04 11:24:48

你可以很容易地得到请求的网址。我的目标是data pdpurl属性。在selenium循环中,您可能需要添加一些对位置请求的处理。在循环过程中需要短暂的等待,以防止产品不可用的错误声明。在

import requests
from bs4 import BeautifulSoup as bs
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time

d = webdriver.Chrome()
results = []

r = requests.get('https://store.nike.com/us/en_us/pw/mens-running-shoes/7puZ8yzZoi3')
soup = bs(r.content, 'lxml')
products = []
listings = soup.select('.grid-item')

for listing in listings:
    url = listing['data-pdpurl']
    title = listing.select_one('.product-display-name').text
    row = {'title' :title ,
           'url' : url}
    products.append(row)

for product in products:
    url = product['url']
    d.get(url)
    try:
        d.get(url)
        desc = WebDriverWait(d,10).until(EC.presence_of_element_located((By.CSS_SELECTOR, ".description-preview")))
        results.append({'product_name': product['title'],
                        'descr' : desc.text})
    except Exception as e:
        print(e, url)
    finally:
        time.sleep(1)

d.quit()
print(results)

相关问题 更多 >