Python皮,在一个循环

2024-10-02 04:22:33 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个小问题:我需要一个预先确定的循环次数。这样做的原因是我正在提交一个POST请求并获取结果。但是,结果不在一页上,因此需要在“cpipage”递增的情况下再次发布。cpipage是页码。这是我的蜘蛛代码,我把网址改成nourl.com网站因为这不是我的网站,我是从。在

from scrapy.contrib.spiders import CrawlSpider, Rule
from scrapy.spider import Spider
from scrapy.contrib.linkextractors.sgml import SgmlLinkExtractor
from scrapy.selector import Selector
from scrapy.http import FormRequest, Request
#from etmd.items import Etmditems
import scrapy

class EtmdSpider(Spider):
    name = "etmd"
    start_urls = ["http://b2.nourl.com/dp.asp"]
    def parse(self, response):
        url = "http://b2.nourl.com/dp.asp"
        payload = {"AppKey": "94921000e1999f84a518725", "ComparisonType1_1": "LIKE", "Value1_1": "", "MatchNull1_1" : "N", "ComparisonType2_1" : "LIKE", "MatchNull2_1" : "N", "Value2_1" : "", "ComparisonType3_1": "=", "MatchNull3_1" : "N", "Value3_1" : "", "x":"69", "y":"27", "FieldName1" : "County", "Operator1": "OR", "NumCriteriaDetails1": "1", "Operator2" : "OR", "NumCriteriaDetails2" : "1", "FieldName3": "Year", "Operator3" : "OR", "NumCriteriaDetails3": "1", "PageID" : "2", "GlobalOperator": "AND", "NumCriteria" : "3", "Search" : "1", "cpipage": "4"}
        return (FormRequest(url, formdata = payload, callback = self.parse_data))

    def parse_data(self, response):
        items = []
        sel = Selector(response)
        items.append(sel.xpath('//td').extract())

        exportfile = open( "exported.txt", "a")
        exportfile.write (str(items))

        print items

所以在有效负载字典中,我有cpipage,在本例中是“4”,但我需要它一直递增到175。不管怎样,在我目前拥有的代码中还是通过运行scrapy spider而不使用shell来实现这一点?在

我已经试过一个for循环:

^{pr2}$

Tags: or代码fromimportselfcomhttpparse
1条回答
网友
1楼 · 发布于 2024-10-02 04:22:33

return语句将立即退出该方法。在

您应该返回所有请求的列表:

def parse(self, response):
    requests = []
    for i in range(175):
        url = "http://b2.nourl.com/dp.asp"
        payload = {"AppKey": "94921000e1999f84a518725", "ComparisonType1_1": "LIKE", "Value1_1": "", "MatchNull1_1" : "N", "ComparisonType2_1" : "LIKE", "MatchNull2_1" : "N", "Value2_1" : "", "ComparisonType3_1": "=", "MatchNull3_1" : "N", "Value3_1" : "", "x":"69", "y":"27", "FieldName1" : "County", "Operator1": "OR", "NumCriteriaDetails1": "1", "Operator2" : "OR", "NumCriteriaDetails2" : "1", "FieldName3": "Year", "Operator3" : "OR", "NumCriteriaDetails3": "1", "PageID" : "2", "GlobalOperator": "AND", "NumCriteria" : "3", "Search" : "1", "cpipage": "%i" %i}
        requests.append(FormRequest(url, formdata = payload, callback = self.parse_data))
    return requests

或者yield一个接一个:

^{pr2}$

相关问题 更多 >

    热门问题