我正在尝试浏览网站ASP.NET Webform。查看源页面可以发现,每当页面加载时,表单都会从服务器接收一个VarsSessionID。当单击继续按钮时,表单向ASMX Webserivce发送一个AJAX请求。重定向到显示新搜索结果的url。在
我已经实现了my scrapy spider来提交ajax post请求,如图所示:
import scrapy
from scrapy.http import *
from scrapy.selector import Selector
import json
from scrapy.utils.response import open_in_browser
class TestSpider(scrapy.Spider):
name = "test"
allowed_domains = ['customer2.videcom.com']
start_urls = ['http://customer2.videcom.com/med-
view/VARS/Public/CustomerPanels/requirements.aspx?country=ng&lang=en']
def parse(self, response):
form_data = {
'VarsSessionID': '',
'__VIEWSTATE': '/wEPDwULLTE4MTk4NDM5NjEPZBYCAgMPZBYCAgMPFgIeB1Zpc2libGVoZGSNuC4VK36MoPTmce49gcH1j2nxAPDYsLXii0G/syddwQ=='}
yield FormRequest.from_response(response,
formid='frmChangePage',
formdata=form_data,
method='POST',
callback=self.after_parse,
url='http://customer2.videcom.com/med-view/VARS/Public/CustomerPanels/requirements.aspx?country=ng&lang=en',
)
def after_parse(self, response):
print "====RESPONSE==="
print response.headers
print "=========="
print response.request.headers
print "=========="
VarsSessionID = Selector(response=response).xpath("//*[@id='VarsSessionID']/@value").extract()[0]
viewstate = Selector(response=response).xpath("//*[@id='__VIEWSTATE']/@value").extract()[0]
print "VarsSessionID: " + VarsSessionID
print "__VIEWSTATE: " + viewstate
url = "http://customer2.videcom.com/med-view/VARS/Public/WebServices/AvailabilityWS.asmx/GetFlightAvailability?VarsSessionID="+VarsSessionID
payload = {
"FormData":
{
'Origin': ['LOS'],
'VarsSessionID': VarsSessionID,
'Destination': ['ABV'],
'DepartureDate': ['05-May-2017'],
'ReturnDate': '',
'Adults': '1',
'Children': '0',
'SmallChildren': '0',
"Seniors": '0',
"Students": '0',
"Infants": '0',
"Youths": '0',
"Teachers": '0',
"SeatedInfants": '0',
"EVoucher": '',
"recaptcha": 'SHOW',
"SearchUser": 'PUBLIC',
"SearchSource": "requirements"
}, "IsMMBChangeFlightMode": 'false'
}
headers = {
'Accept': 'application/json, text/javascript, */*',
'Accept-Encoding': 'gzip, deflate, br',
'accept-language': 'en_US',
'Connection': 'keep-alive',
'content-type': 'application/json',
'Cookie': {'VarsSessionID':''},
'User-Agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36"
}
yield Request(url,
callback=self.after_search,
method='POST',
body=json.dumps(payload),
headers=headers)
def after_search(self, response):
print "========SEARCH HEADERS========"
print response.headers
print response.request.headers
open_in_browser(response)
我使用Chrome开发工具检查了头文件(请求和响应),以确定cookies和其他头信息。在
在运行上面的代码时,我一直得到一个Internal Server Error 500
,如下所示:
我需要帮助来计算如何发布数据和接收搜索结果,例如当我用浏览器搜索时。谢谢
将请求中硬编码的
__VIEWSTATE
参数替换为“fresh”参数。在获取某个时间后的复杂状态是无效的。在
有时在ASP网站上,
FormRequest.from_response
无法正确捕获此参数,因此您可能需要检查响应.正文以了解如何提取__VIEWSTATE
。在下面是一个很好的例子:https://blog.scrapinghub.com/2016/04/20/scrapy-tips-from-the-pros-april-2016-edition/
相关问题 更多 >
编程相关推荐