我创建了两个脚本;一个使用requests模块,另一个使用scrapy。他们两人都干得干干净净。以下是如何在该站点中手动生成结果:
2220 CLOVE TERR
放在Property Address
旁边,然后点击搜索按钮Block
的值,即4759
李>由于__VIEWSTATE
是post请求发送的最重要参数之一,用于填充任何以.aspx
结尾的站点的结果,因此我必须在第一个脚本中使用它来获得结果
然而,当我选择scrapy时,我仍然可以在不显式使用__VIEWSTATE
的情况下得到相同的结果
使用请求:
import requests
from bs4 import BeautifulSoup
link = 'https://cityservices.baltimorecity.gov/realproperty/default.aspx'
search_address = '2220 CLOVE TERR'
with requests.Session() as s:
s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
key = 'ctl00$ctl00$rootMasterContent$LocalContentPlaceHolder${}'
payload = {}
r = s.get(link)
soup = BeautifulSoup(r.text,"lxml")
payload['__VIEWSTATE'] = soup.select_one("input[id='__VIEWSTATE']")['value']
payload[key.format('txtAddress')] = search_address
payload[key.format('btnSearch')] = 'Search'
res = s.post(link,data=payload)
soup = BeautifulSoup(res.text,"lxml")
block = soup.select_one("[id$='_DataGrid1'] > tr:not(th) > td").get_text(strip=True)
print(block)
使用刮屑:
class RealpropertySpider(Spider):
name = 'companies'
start_url = 'https://cityservices.baltimorecity.gov/realproperty/default.aspx'
search_address = '2220 CLOVE TERR'
def start_requests(self):
yield Request(self.start_url)
def parse(self, response):
key = 'ctl00$ctl00$rootMasterContent$LocalContentPlaceHolder${}'
formdata = {
key.format('txtAddress'): self.search_address,
key.format('btnSearch'): 'Search'
}
yield FormRequest.from_response(
response,
formdata=formdata,
callback=self.parse_content
)
def parse_content(self, response):
block = response.xpath("//*[contains(@id,'_DataGrid1')]/tr[not(th)]/td/text()").get()
yield {"Block":block}
Question: Is there any way I can mimic
FormRequest.from_response
while using requests so that I don't need to supply__VIEWSTATE
within payload to fetch the required content?
因为FormRequest.from_response()已经加载了包含
viewstate
的表单字段,所以您的scrapy解决方案可以工作是否有以下情况:
form
标记。(_get_form)关于您的案例
__VIEWSTATE
此步骤的结果包括的数据将
formdata
参数中的字段应用于新的有效负载李>据我所知requests库没有任何类似的实现
如果由于某种原因您不能使用scrapy,并且您需要此功能,那么您可能需要自己复制所有提到的步骤(提供指向scrapy代码相关部分的链接)
相关问题 更多 >
编程相关推荐