从一个网站刮所有的链接使用刮不工作

# -*- coding: utf-8 -*- import scrapy class DummySpider(scrapy.Spider): name = 'dummyspider' allowed_domains = ['alibaba.com'] start_urls = ['https://www.alibaba.com/countrysearch/CN/China/products/A.html' ] def parse(self, response): link = response.xpath('//*[@class="column one3"]/a/@href').extract() for item in zip(link): scraped_info = { 'link':item[0], } yield scraped_info next_page_url = response.xpath('//*[@class="page_btn"]/@href').extract_first() if next_page_url: next_page_url = response.urljoin(next_page_url) yield scrapy.Request(url = next_page_url, callback = self.parse)

1条回答

网友

1楼 · 发布于 2024-10-03 13:18:33

您可以通过正确设置起始URL来解决这个问题。你知道吗

string模块具有字母常量：

$ import string
$ string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

您可以使用以编程方式创建URL：

import string
from scrapy import Spider  

class MySpider(Spider):
    name = 'alibaba'
    start_urls = [
        f'http://foo.com?letter={char}' 
        for char in string.ascii_uppercase
    ]

相关问题更多 >

编程相关推荐

热门问题

热门文章