<p>他们发送的是相同的东西(scrapy的FormRequest只是url编码的),但我认为它发生的是,当你第一次登陆<code>http://www.istic.ac.cn/suoguan/QiKan_ShouYe.htm?lan=en&journalId=IELEP0229&yp=2018</code>时,它需要接收一个cookie,请尝试以下操作:</p>
<pre><code> # -*- coding: utf-8 -*-
import json
import re
import scrapy
from scrapy import FormRequest
class IsticSpider(scrapy.Spider):
name = "istic"
allowed_domains = ["istic.ac.cn"]
start_urls = ['http://www.istic.ac.cn/suoguan/QiKan_ShouYe.htm?lan=en&journalId=IELEP0229&yp=2018']
def parse(self, response):
posturl = 'http://www.istic.ac.cn/suoguan/essearch.ashx'
journalId = re.search(r'journalId=(.*?)&', response.url).group(1)
yearNum = re.search(r'&yp=(\d+)', response.url).group(1)
postdata = {
"indexname" : "xw_qk",
"search" : "{0}/F(F_ReqNum)*{1}/F(F_YEAR)".format(journalId, yearNum),
"page" : "0",
"pagenum" : "20",
"sort" : "",
"type" : "content",
}
yield FormRequest(posturl, formdata = postdata, callback = self.parse_item)
def parse_item(self, response):
data = json.loads(response.body_as_unicode())
self.logger.debug('%s', data.keys())
</code></pre>
<p>它应该输出<code>[u'facets', u'hits', u'took']</code></p>