Python&beautifulsoup4刮码优化

2024-09-30 01:25:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图刮多个网站的具体产品,我相信有一个方法来优化我的代码。到目前为止,代码完成了它的工作,但这并不是Python的方法(我是Python新手,所以请原谅我缺乏知识)。你知道吗

这个程序的目标是从提供的URL获取产品的价格,并将其写入.csv文件。每个网站有不同的结构,但我总是使用相同的3个网站。这是我当前代码的一个示例:

import requests
import csv
import io
import os
from datetime import datetime
from bs4 import BeautifulSoup

timeanddate=datetime.now().strftime("%Y%m%d-%H%M%S")

folder_path = 
'my_folder_path'
file_name = 'product_prices_'+timeanddate+'.csv'
full_name = os.path.join(folder_path, file_name)

with io.open(full_name, 'w', newline='', encoding="utf-8") as file:
 writer = csv.writer(file)
writer.writerow(["ProductTitle", "Website1", "Website2", "Website3"])

#---Product 1---
#Website1 price
website1product1 = requests.get('website1product1URL')
website1product1Data = BeautifulSoup(website1product1.text, 'html.parser')
website1product1Price = website1product1Data.find('div', attrs={'class': 'price-final'}).text.strip()
print(website1product1Price)

#Website2 price
website2product1 = requests.get('website2product1URL')
website2product1Data = BeautifulSoup(website2product1.text, 'html.parser')
website2product1Price = website2product1Data.find('div', attrs={'class': 'price_card'}).text.strip()
print(website2product1Price)

#Website3 price
website3product1 = requests.get('website3product1URL')
website3product1Data = BeautifulSoup(website3product1.text, 'html.parser')
website3product1Price = website3product1Data.find('strong', attrs={'itemprop': 'price'}).text.strip()
print(website3product1Price)

writer.writerow(["ProductTitle", website1product1Price, website2product1Price, website3product1Price])

file.close()

它以这种格式将产品标题和价格保存到.csv中,我希望保留这种格式:

#Header
ProductTitle Website1 Website2 Website3
#Scraped data
Product1     $23      $24      $52

对于一些产品来说,这是可以管理的,但是我希望有几百个,复制相同的代码行和更改变量名是令人困惑的、乏味的,并且必然会充满人为错误。你知道吗

我可以创建一个函数,将3个url作为参数,输出website1product1Price、website2product1Price和website2product1Price,并为每个产品调用该函数一次吗?然后它能被包装成一个循环来遍历url列表并且仍然保持原始格式吗?你知道吗

感谢您的帮助。你知道吗


Tags: csvpath代码textnameimport产品网站
2条回答

这是你的解决方案吗? 承认你的产品有一系列dict:

products = [
    {
      'name': 'product1',
      'url1': 'https://url1',
      'url2': 'https://url2',
      'url3': 'https://url3'
    }
]

您的代码可以是这样的:

import requests
import csv
import io
import os
from datetime import datetime
from bs4 import BeautifulSoup

def get_product_prices(product):

    # -Product 1 -
    #Website1 price
    website1product1 = requests.get(product['url1'])
    website1product1Data = BeautifulSoup(website1product1.text, 'html.parser')
    website1product1Price = website1product1Data.find('div', attrs={'class': 'price-final'}).text.strip()

    #Website2 price
    website2product1 = requests.get(product['url2'])
    website2product1Data = BeautifulSoup(website2product1.text, 'html.parser')
    website2product1Price = website2product1Data.find('div', attrs={'class': 'price_card'}).text.strip()

    #Website3 price
    website3product1 = requests.get(product['url3'])
    website3product1Data = BeautifulSoup(website3product1.text, 'html.parser')
    website3product1Price = website3product1Data.find('strong', attrs={'itemprop': 'price'}).text.strip()

    return website1product1Price, website2product1Price, website3product1Price

timeanddate=datetime.now().strftime("%Y%m%d-%H%M%S")

folder_path = 
'my_folder_path'
file_name = 'product_prices_'+timeanddate+'.csv'
full_name = os.path.join(folder_path, file_name)

with io.open(full_name, 'w', newline='', encoding="utf-8") as file:
    writer = csv.writer(file)
    writer.writerow(["ProductTitle", "Website1", "Website2", "Website3"])

    for product in products:
        price1, price2, price3 = get_product_prices(product)
        write.writerow(product['name'], price1, price2, price3)

file.close()

您可以创建一个函数,并将所有内容作为参数传递,如urltag_nameattribute_nameattribute_value。请查看这是否有帮助。你知道吗

def price_text(url_text,ele_tag,ele_attr,attrval):
 website1product1 = requests.get(url_text)
 website1product1Data = BeautifulSoup(website1product1.text, 'html.parser')
 website1product1Price=website1product1Data.find("'" + ele_tag + "'", attrs="{'" + ele_attr + "': '" + attrval + "'}").text.strip()
 print(website1product1Price)

website1product1Price=price_text("url","div","class","price-final")
website1product2Price=price_text("url","div","class","price_card")
website1product3Price=price_text("url","strong","itemprop","price")

相关问题 更多 >

    热门问题