我使用此代码从网站中提取数据,但它不足以获取我需要的所有数据
此外,我无法捕获sku
值,该值被掩埋在页面末尾,而其他有价值的数据则被掩埋在页面末尾(我不知道如何获取)
{"offer_code":"dd3125025109fb4d","sku":"N15614801A","sku_config":"N15614801A","brand":null,"name":"AirPods Strap White","plp_specifications":{},"price":4.4,"sale_price":1.3,"url":"airpods-strap-white","image_key":"v1532025662/N15614801A_1","is_buyable":true,"flags":["fbn","prepaid"]},
让结果文件包含所有这些值(这就是我要查找的值)将非常有帮助。
price,title,sku,offer_code,brand,sale_price
这是我使用的python代码
from bs4 import BeautifulSoup as soup
import requests
number_of_threads = 6
headers = "price,title,sku, \n"
def extract_data_from_url_func(url):
print(url)
response = requests.get(url)
page_soup = soup(response.text, "html.parser")
output_list = [price,title,sku,]
output = output + ",".join(output_list) + "\n"
print(output)
return output
with open("speednoon.txt", "r") as fr:
URLS = list(map(lambda x: x.strip(), fr.readlines()))
with open(out_filename, "w", encoding='utf-8-sig') as fw:
fw.write(headers)
for response in responses:
fw.write(response + "\n")
使用带有数字的类名有一个缺点。它们可能因每种产品而异。因此,以下是对这个问题的简单回答:
注意:我并不是在写全部代码。只是过滤部分。 注意:使用lxml代替html.parser。要使用它,请键入
import lxml
价格:
对于SKU:
对于报价代码:
对于品牌: 我找不到它。请指定品牌名称的位置
售价: 销售价格是我已经提到的价格。如果你指的是原价,它就在“preReductionPrice”类中。用.text抓住它
相关问题 更多 >
编程相关推荐