在python中,结果文件不包含所有这些值

2024-10-02 20:37:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我使用此代码从网站中提取数据,但它不足以获取我需要的所有数据


此外,我无法捕获sku值,该值被掩埋在页面末尾,而其他有价值的数据则被掩埋在页面末尾(我不知道如何获取)

{"offer_code":"dd3125025109fb4d","sku":"N15614801A","sku_config":"N15614801A","brand":null,"name":"AirPods Strap White","plp_specifications":{},"price":4.4,"sale_price":1.3,"url":"airpods-strap-white","image_key":"v1532025662/N15614801A_1","is_buyable":true,"flags":["fbn","prepaid"]},

让结果文件包含所有这些值(这就是我要查找的值)将非常有帮助。

price,title,sku,offer_code,brand,sale_price

这是我使用的python代码

from bs4 import BeautifulSoup as soup
import requests

number_of_threads = 6
headers = "price,title,sku, \n"

def extract_data_from_url_func(url):
    print(url)
    response = requests.get(url)
    page_soup = soup(response.text, "html.parser")


        output_list = [price,title,sku,]
        output = output + ",".join(output_list) + "\n"
        print(output)

    return output

with open("speednoon.txt", "r") as fr:
    URLS = list(map(lambda x: x.strip(), fr.readlines()))




with open(out_filename, "w", encoding='utf-8-sig') as fw:
  fw.write(headers)
  for response in responses:
      fw.write(response + "\n")

Tags: 数据代码urloutputtitleresponseas页面
1条回答
网友
1楼 · 发布于 2024-10-02 20:37:23

使用带有数字的类名有一个缺点。它们可能因每种产品而异。因此,以下是对这个问题的简单回答:

注意:我并不是在写全部代码。只是过滤部分。 注意:使用lxml代替html.parser。要使用它,请键入import lxml

title = soup.find_all('div',{'class':'name'})
for i in title:
     print("Title: "+i.text)

价格:

price = soup.find_all('span',{'class':'sellingPrice'})
for i in price:
    print("Price: "+i.text)

对于SKU:

sku = soup.find_all('div',{'class':'gridView'})
for i in sku:
    sku_num = i.find('a')['href'].split('/')
    print(sku_num[3])

对于报价代码:

code = soup.find_all('div',{'class':'gridView'})
for i in sku:
    code_num = i.find('a')['href'].split('/')
    print(code_num[4].split("=")[1])

对于品牌: 我找不到它。请指定品牌名称的位置

售价: 销售价格是我已经提到的价格。如果你指的是原价,它就在“preReductionPrice”类中。用.text抓住它

相关问题 更多 >