<p>我认为只有当有效负载是<code>body=json.dumps(self.formdata)</code>而不是<code>formdata=self.formdata</code>时才能得到有效的响应,因为它们是json格式的。建议部分如下:</p>
<pre><code>def start_requests(self):
for url in self.start_urls:
yield scrapy.FormRequest(
url=url,method='POST',
headers=self.headers,body=json.dumps(self.formdata),
callback=self.parse_page,
)
</code></pre>
<p>当您使用<code>parse()</code>方法时,默认情况下,该方法从<code>start_urls</code>到<code>get</code>请求获取响应,但在这种情况下,您在<code>start_urls</code>中使用的url永远不会通过<code>parse()</code>方法,因为它将抛出status 400错误或其他错误。因此,要像您尝试的那样使用<code>parse()</code>方法,请确保您在<code>start_urls</code>中使用的<code>url</code>能够获得所需的状态。也就是说,即使使用状态为200的differnt <code>url</code>,然后使用<code>right url</code>处理post请求,那么响应也是所需的。你知道吗</p>
<pre><code>import json
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
#different url
start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping']
url = 'https://277kmabdt6-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%20(lite)%203.27.1%3BJS%20Helper%202.26.0%3Bvue-instantsearch%201.7.0&x-algolia-application-id=277KMABDT6&x-algolia-api-key=bf8b92303c2418c9aed3c2f29f6cbdab'
formdata = {
'requests': [{'indexName': 'listings',
'params': 'query=&hitsPerPage=24&page=0&highlightPreTag=__ais-highlight__&highlightPostTag=__%2Fais-highlight__&filters=announce_type%3Aproperty-announces%20AND%20language_code%3Apt%20AND%20listing_id%3A%205&facets=%5B%22announce_type%22%5D&tagFilters='}]
}
headers = {
'accept': 'application/json',
'content-type': 'application/x-www-form-urlencoded',
'Origin': 'https://www.flat.com.br',
'Referer': 'https://www.flat.com.br/search?query=',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36',
}
def parse(self,response):
yield scrapy.Request(
url=self.url,method='POST',
headers=self.headers,body=json.dumps(self.formdata),
callback=self.parse_page,
)
def parse_page(self, response):
print(json.loads(response.text))
</code></pre>