pythonscrapy:400来自表单请求的响应

2024-09-30 14:29:28 发布

您现在位置：Python中文网/ 问答频道 /正文

9746

网友

男 | 程序猿一只，喜欢编程写python代码。

我一直在试图浏览网站https://fbschedules.com/new-england-patriots-schedule/

这个网站使用一个隐藏的表单向php文件提交ajax请求：https://fbschedules.com/wp-admin/admin-ajax.php

在尝试模拟AJAX请求后，scrapy返回一个400响应：

def parse(self, response):
    headers = {
        'User_Agent': user_agent,
        'Accept': 'application/json, text/javascript, */*; q=0.01',
        'Accept-Language': 'en-US,en;q=0.5',
        'Accept-Encoding': 'gzip, deflate, br',
        'Referer': 'https://fbschedules.com/new-england-patriots-schedule/',
        'Content-Type': 'application/x-www-form-urlencoded; charset=UTF-8',
        'X-Requested-With': 'XMLHttpRequest',
        'Cookie': cookie,
        'DNT': '1',
        'Connection': 'keep-alive',
        'Cache-Control': 'max-age=0'
    }

    data = {
        'action': 'load_fbschedules_ajax',
        'type': 'NFL',
        'display': 'Season',
        'team': 'New+England+Patriots',
        'current_season': '2018',
        'view': '',
        'conference': '',
        'conference-division': '',
        'ncaa-subdivision': '',
        'ispreseason': '',
        'schedule-week': '',
    }

    yield scrapy.FormRequest.from_response('https://fbschedules.com/wp-admin/admin-ajax.php',
                                           headers=headers,
                                           formdata=data,
                                           method='POST',
                                           callback=self.schedule_parse)

任何在正确方向上的帮助都是值得感谢的！在

编辑：我还应该提到，我将这个spider作为单个脚本运行，使用：

^{pr2}$

开始页面爬网。控制台输出如下：

2018-09-02 18:20:33 [scrapy.core.engine] INFO: Spider opened
2018-09-02 18:20:33 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2018-09-02 18:20:33 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6024
2018-09-02 18:20:33 [scrapy.core.engine] DEBUG: Crawled (400) https://fbschedules.com/wp-admin/admin-ajax.php> (referer: None)
2018-09-02 18:20:33 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <400 https://fbschedules.com/wp-admin/admin-ajax.php>: HTTP status code is not handled or not allowed
2018-09-02 18:20:33 [scrapy.core.engine] INFO: Closing spider (finished)

Tags： https core info com admin response ajax engine

1条回答

网友

1楼 · 发布于 2024-09-30 14:29:28

我也有同样的问题，我通过向FormRequest参数添加meta参数来处理它。在

尝试使用scrapy.FormRequest而不是scrapy.FormRequest.from_response：

meta = {'handle_httpstatus_all': True}
yield FormRequest('https://fbschedules.com/wp-admin/admin-ajax.php',
                                           headers=headers,
                                           formdata=data,
                                           method='POST',
                                           meta=meta,
                                           callback=self.schedule_parse)

pythonscrapy:400来自表单请求的响应

相关问题更多 >

编程相关推荐

热门问题

热门文章

pythonscrapy:400来自表单请求的响应

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >