我试图从url列表中解析信息,但是我的代码每次都解析同一个页面

2024-09-25 00:23:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试解析用UTF8格式保存的url列表和python空闲文件夹中的命名链接。一个例子是:

'https://www.safirstores.com/%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C#/availability=1/sort=p.sort_order/order=ASC/limit=32/page=44'

但是当我试着运行我的代码时,它每次都会解析URL第一部分的信息。就像解析以下URL:

'https://www.safirstores.com/%D8%A2%D8%B1%D8%A7%DB%8C%D8%B4%DB%8C#'

请注意,url结尾有#。我想这会阻止我的网址改变。我不知道为什么

这是我的完整代码:

import requests
import urllib.request
from bs4 import BeautifulSoup
import csv

with open('links.txt','r', encoding="utf8") as f:
urls = f.read().split()

with open('promo.csv', 'w', newline='', encoding='utf-8') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['name', 'links', 'price'])

for url in urls:
    try:
        print(url)
        source = requests.get(url).text
        soup = BeautifulSoup(source, 'lxml')
        divs = soup.find_all('div', class_='caption')
        if divs:
            for div in divs:
                price = div.find('p', {'class':'price'}).text.strip()
                print(price)
                name = div.find('h4', {'class':'name'}).text.strip()
                print(name)
                links = div.find('a')['href']
                print(links)
                print()
                csv_output.writerow([name, links, price])
        else:
            print("Finished")
            break
except Exception as e:
    print(e)

我的URL列表如下所示:

https://www.safirstores.com/آرایشی#/availability=1/sort=p.sort_order/order=ASC/limit=32/page=1
https://www.safirstores.com/آرایشی#/availability=1/sort=p.sort_order/order=ASC/limit=32/page=2
https://www.safirstores.com/آرایشی#/availability=1/sort=p.sort_order/order=ASC/limit=32/page=3

我该怎么做才能避免这样的问题

提前谢谢你的时间


Tags: csvnamehttpsdivcomurloutputwww
1条回答
网友
1楼 · 发布于 2024-09-25 00:23:09

试试这个:

import requests
import csv
from bs4 import BeautifulSoup

url = 'https://www.safirstores.com/index.php'

payload = {
'route': 'module/journal2_super_filter/products',
'module_id': '54'}

with open('promo.csv', 'w', newline='', encoding='utf-8') as f_output:
    csv_output = csv.writer(f_output)
    csv_output.writerow(['name', 'links', 'price'])


for page in range(1, 100):
    try:
        form = {
        'filters': '/availability=1/sort=p.sort_order/order=ASC/limit=32/page=%s' %page,
        'route': 'product/category',
        'path': '238',
        'manufacturer_id': '',
        'search':'' ,
        'tag':'' }

        source = requests.post(url, params=payload, data=form)

        soup = BeautifulSoup(source.text, 'html.parser')
        divs = soup.find_all('div', class_='caption')
        if divs:
            for div in divs:
                price = div.find('p', {'class':'price'}).text.strip()
                print(price)
                name = div.find('h4', {'class':'name'}).text.strip()
                print(name)
                links = div.find('a')['href']
                print(links)
                print()
                csv_output.writerow([name, links, price])
        else:
            print("Finished")
            break
    except Exception as e:
        print(e)

相关问题 更多 >