按预定时间爬蜘蛛

2024-09-28 05:45:11 发布

男 | 程序猿一只，喜欢编程写python代码。

我想在预定的时间内抓取蜘蛛多次。下次爬网时间将在第一次爬网完成后确定。下面是我要执行此操作的代码，但代码将在第一行crawler.start()处被阻止：

spidersQ = collections.OrderedDict()

class QuotesSpider(scrapy.Spider):
    name = "quotes"
    global spidersQ
    start_urls = [
        "https://www.amazon.com",
    ]

    def parse(self, response):
        root = lxml.html.fromstring(response.body)
        lxml_result = root.xpath("(//div[contains(@class,'a-section')]/div[contains(@class,'olpOffer')])[1]")

        price = lxml_result[0].text.strip()
        # Now schedule this spider to run again after 5 seconds
        spidersQ[datetime.datetime.now() + datetime.timedelta(seconds=5)] = QuotesSpider


def main():
    process = CrawlerProcess({
        'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
    })

    process.crawl(QuotesSpider)
    process.start(stop_after_crawl=False)  # the script will block here forever

    while True:
        if datetime.datetime.now() > first(spidersQ):
            schedTime, spider = spidersQ.popitem(last=False)
            process.crawl(spider)
            process.start(stop_after_crawl=False)
        else:
            time.sleep(1)

Tags：代码 false datetime response def 时间 process start

1条回答

网友

1楼 · 发布于 2024-09-28 05:45:11

您可以尝试使用外部模块计划：

Python job scheduling for humans

按预定时间爬蜘蛛

相关问题更多 >

编程相关推荐

热门问题

热门文章

按预定时间爬蜘蛛

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >