从多个href lis中提取python硒

2024-09-29 22:34:16 发布

您现在位置:Python中文网/ 问答频道 /正文

这是要测试的url https://stockx.com/puma?prices=300-400,200-300&size_types=men&years=2017

我能够提取所有产品详细页面href链接,但是我最后只得到一个结果。它应该去所有的链接,并提取我的名字和img网址。我错过了什么?在

当前输出结果为json

[
    {
        "product_name": "Puma Clyde WWE Undertaker Black",
        "imgurl": "https://stockx.imgix.net/Puma-Clyde-WWE-Undertaker-Black.png?fit=fill&bg=FFFFFF&w=700&h=500&auto=format,compress&q=90&dpr=2&trim=color&updated_at=1538080256"
    }
]

这是工作代码

^{pr2}$

Tags: httpscomurlsize链接typespricesblack
1条回答
网友
1楼 · 发布于 2024-09-29 22:34:16

我想你可以按要求来做。我从访问过的页面中随机选择一些项目来证明访问过。在

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
baseURL = 'https://stockx.com'
final = []
with requests.Session() as s:
    res = s.get('https://stockx.com/puma?prices=300-400,200-300&size_types=men&years=2017')
    soup = bs(res.content, 'lxml')
    items  = soup.select('#products-container [href]')
    titles = [item['id'] for item in items]
    links = [baseURL + item['href'] for item in items]
    results = list(zip(titles, links))
    df = pd.DataFrame(results) 
    for result in results:
        res = s.get(result[1])
        soup = bs(res.content, 'lxml')
        details = [item.text for item in soup.select('.detail')]
        final.append([result[0], result[1], details])
df2 = pd.DataFrame(final)
df2.to_csv(r'C:\Users\User\Desktop\data.csv', sep=',', encoding='utf-8',index = False )

相关问题 更多 >

    热门问题