当方法是POST时，Scrapy FormRequest发送GET请求问题的回答

当方法是POST时，Scrapy FormRequest发送GET请求

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

<a href="https://www.flat.com.br/search?query=" rel="nofollow noreferrer">This</a>是我要爬网的页面 页面上的数据来自这个<a href="https://277kmabdt6-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%20(lite)%203.27.1%3BJS%20Helper%202.26.0%3Bvue-instantsearch%201.7.0&x-algolia-application-id=277KMABDT6&x-algolia-api-key=bf8b92303c2418c9aed3c2f29f6cbdab" rel="nofollow noreferrer">URL</a> 这是我的爬虫代码。我至少检查了5次标题和表单数据。我认为他们是对的。问题是向<code>start_url</code>发送一个<code>GET</code>请求很难，即使我重写了<code>parse</code>方法的默认行为。你知道吗 <pre><code>class MySpider(CrawlSpider): name = 'myspider' start_urls = [ 'https://277kmabdt6-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%20(lite)%203.27.1%3BJS%20Helper%202.26.0%3Bvue-instantsearch%201.7.0&x-algolia-application-id=277KMABDT6&x-algolia-api-key=bf8b92303c2418c9aed3c2f29f6cbdab', ] formdata = { 'requests': [{'indexName': 'listings', 'params': 'query=&hitsPerPage=24&page=0&highlightPreTag=__ais-highlight__&highlightPostTag=__%2Fais-highlight__&filters=announce_type%3Aproperty-announces%20AND%20language_code%3Apt%20AND%20listing_id%3A%205&facets=%5B%22announce_type%22%5D&tagFilters='}] } headers = { 'accept': 'application/json', 'content-type': 'application/x-www-form-urlencoded', 'Origin': 'https://www.flat.com.br', 'Referer': 'https://www.flat.com.br/search?query=', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36', } def parse(self, response): for url in self.start_urls: yield scrapy.FormRequest( url=url, method='POST', headers=self.headers, formdata=self.formdata, callback=self.parse_page, ) def parse_page(self, response): print json.loads(response.text) </code></pre> 这是我运行蜘蛛时得到的信息。你知道吗 我的问题是：为什么scrapy向url发送一个<code>GET</code>请求，我遗漏了什么？可能是我的请求失败的其他原因吗？你知道吗 <pre><code>2019-07-01 11:45:58 [scrapy] DEBUG: Crawled (400) <GET https://277kmabdt6-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%20(lite)%203.27.1%3BJS%20Helper%202.26.0%3Bvue-instantsearch%201.7.0&x-algolia-application-id=277KMABDT6&x-algolia-api-key=bf8b92303c2418c9aed3c2f29f6cbdab> (referer: None) 2019-07-01 11:45:58 [scrapy] DEBUG: Ignoring response <400 https://277kmabdt6-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%20(lite)%203.27.1%3BJS%20Helper%202.26.0%3Bvue-instantsearch%201.7.0&x-algolia-application-id=277KMABDT6&x-algolia-api-key=bf8b92303c2418c9aed3c2f29f6cbdab>: HTTP status code is not handled or not allowed </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

我认为只有当有效负载是<code>body=json.dumps(self.formdata)</code>而不是<code>formdata=self.formdata</code>时才能得到有效的响应，因为它们是json格式的。建议部分如下： <pre><code>def start_requests(self): for url in self.start_urls: yield scrapy.FormRequest( url=url,method='POST', headers=self.headers,body=json.dumps(self.formdata), callback=self.parse_page, ) </code></pre> 当您使用<code>parse()</code>方法时，默认情况下，该方法从<code>start_urls</code>到<code>get</code>请求获取响应，但在这种情况下，您在<code>start_urls</code>中使用的url永远不会通过<code>parse()</code>方法，因为它将抛出status 400错误或其他错误。因此，要像您尝试的那样使用<code>parse()</code>方法，请确保您在<code>start_urls</code>中使用的<code>url</code>能够获得所需的状态。也就是说，即使使用状态为200的differnt <code>url</code>，然后使用<code>right url</code>处理post请求，那么响应也是所需的。你知道吗 <pre><code>import json import scrapy class MySpider(scrapy.Spider): name = 'myspider' #different url start_urls = ['https://stackoverflow.com/questions/tagged/web-scraping'] url = 'https://277kmabdt6-dsn.algolia.net/1/indexes/*/queries?x-algolia-agent=Algolia%20for%20vanilla%20JavaScript%20(lite)%203.27.1%3BJS%20Helper%202.26.0%3Bvue-instantsearch%201.7.0&x-algolia-application-id=277KMABDT6&x-algolia-api-key=bf8b92303c2418c9aed3c2f29f6cbdab' formdata = { 'requests': [{'indexName': 'listings', 'params': 'query=&hitsPerPage=24&page=0&highlightPreTag=__ais-highlight__&highlightPostTag=__%2Fais-highlight__&filters=announce_type%3Aproperty-announces%20AND%20language_code%3Apt%20AND%20listing_id%3A%205&facets=%5B%22announce_type%22%5D&tagFilters='}] } headers = { 'accept': 'application/json', 'content-type': 'application/x-www-form-urlencoded', 'Origin': 'https://www.flat.com.br', 'Referer': 'https://www.flat.com.br/search?query=', 'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.100 Safari/537.36', } def parse(self,response): yield scrapy.Request( url=self.url,method='POST', headers=self.headers,body=json.dumps(self.formdata), callback=self.parse_page, ) def parse_page(self, response): print(json.loads(response.text)) </code></pre>

当方法是POST时，Scrapy FormRequest发送GET请求

1 个回答

相关Python问题