多个url的棘手问题

2024-10-03 23:25:12 发布

男 | 程序猿一只，喜欢编程写python代码。

我从多个URL中抓取数据，方法是：

import scrapy

from pogba.items import PogbaItem

class DmozSpider(scrapy.Spider):
    name = "pogba"
    allowed_domains = ["fourfourtwo.com"]
    start_urls = [
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459525/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459571/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459585/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459614/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459635/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459644/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459662/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459674/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459686/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459694/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459705/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459710/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459737/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459744/player-stats/74208/OVERALL_02",
        "http://www.fourfourtwo.com/statszone/21-2012/matches/459765/player-stats/74208/OVERALL_02"
    ]

    def parse(self, response):
        Coords = []
        for sel in response.xpath('//*[@id="pitch"]/*[contains(@class,"success")]'):
            item = PogbaItem()
            item['x'] = sel.xpath('(@x|@x1)').extract() 
            item['y'] = sel.xpath('(@y|@y1)').extract() 
            Coords.append(item)
        return Coords

问题是在这种情况下，我有一个csv大约有200行，而对于每个url我有大约50行。一次抓取一个url很好，但是如果设置多个url，为什么会有不同的结果呢？在

Tags： import com http url www stats coords item

1条回答

网友

1楼 · 发布于 2024-10-03 23:25:12

我会尝试调整爬行速度，并通过增加请求之间的延迟（^{} setting）和减少并发请求的数量（^{} setting）来降低一点速度，例如：

DOWNLOAD_DELAY = 1
CONCURRENT_REQUESTS = 4

多个url的棘手问题

相关问题更多 >

编程相关推荐

热门问题

热门文章

多个url的棘手问题

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >