如何请求新的url?

2024-09-26 18:07:30 发布

您现在位置:Python中文网/ 问答频道 /正文

我已经有这个代码了,以前有个朋友帮过我。我已经得到了网站上的所有链接。我想得到的名称,梅克,价格,图片,产品的描述,和产品的链接。只有单击产品时,说明的产品才会出现。你知道吗

我是Python的初学者。你知道吗

from bs4 import BeautifulSoup
import urllib.request


count = 1
url = "https://www.sociolla.com/155-foundation?p=%d"

def get_url(url):
     req = urllib.request.Request(url)
     return urllib.request.urlopen(req)

expected_url = url % count
response = get_url(expected_url)

link = []
name = []
merk = []
price = []
pic = []
description = []


while (response.url == expected_url):
     #print("GET {0}".format(expected_url))
     soup = BeautifulSoup(response.read(), "html.parser")
     products = soup.find("div",{"id":"product-list-grid"})
     for i in products:
           data = products.findAll("div",{"class":"product-item"})
     for j in range(0, len(data)):
           link.append(data[j]["data-eec-href"])


     count += 1
     expected_url = url % count
     response = get_url(expected_url)


print(len(link))

"""
import csv
dataset=zip(link, merk, name, pic, price, description)    
with open("foundation_sociolla.csv","w", newline='') as csvfile:
    writer=csv.writer(csvfile)
    header=['link', 'merk', 'name', 'pic', 'price', 'description']
    writer.writerow(header)
    writer.writerows(dataset)
"""

Tags: nameimporturldataget产品responserequest
1条回答
网友
1楼 · 发布于 2024-09-26 18:07:30

您需要向URL发出请求。解析该请求的内容并提取所需的数据。你知道吗

from bs4 import BeautifulSoup
import urllib.request

count = 1
url = "https://www.sociolla.com/155-foundation?p=%d"


def get_url(url):
    req = urllib.request.Request(url)
    return urllib.request.urlopen(req)

expected_url = url % count
response = get_url(expected_url)

link = []
name = []
make = []
price = []
pic = []
description = []

while response.url == expected_url:
    soup = BeautifulSoup(response.read(), "html.parser")
    for product in soup.select("div.product-item"):
        product_url = (product['data-eec-href'])
        link.append(product_url)
        product_response = get_url(product_url)
        product_soup = BeautifulSoup(product_response.read(), "html.parser")
        product_pic = product_soup.select('img#bigpic')[0]['src']
        pic.append(product_pic)
        product_price = product_soup.select('span#our_price_display')[0].text.strip()
        price.append(product_price)
        product_name = product_soup.select('div.detail-product-logo p')[0].text.strip()
        name.append(product_name)
        product_make = product_soup.select('div.detail-product-logo h3')[0].text.strip()
        make.append(product_make)
        product_description = product_soup.select('div#Details article')[0].text.strip()
        description.append(product_description)

        print(product_url, product_pic, product_price, product_name, product_make, product_description)

    count += 1
    expected_url = url % count
    response = get_url(expected_url)

但是,如果你要刮很多页,最好使用像Scrapyhttps://scrapy.org/这样的工具

相关问题 更多 >

    热门问题