回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我在<a href="https://searchlight.cluen.com/E5/CandidateSearch.aspx" rel="nofollow noreferrer">https://searchlight.cluen.com/E5/CandidateSearch.aspx</a>有一个ASPX页面,上面有一个表单,我想提交它并解析以获取信息。在</p>
<p>使用Python的urllib和urllib2,我创建了一个带有适当头和用户代理的post请求。但不包含预期的结果。我是误解了还是遗漏了一些明显的细节?在</p>
<pre><code> import urllib
import urllib2
headers = {
'HTTP_USER_AGENT': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.13) Gecko/2009073022 Firefox/3.0.13',
'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml; q=0.9,*/*; q=0.8',
'Content-Type': 'application/x-www-form-urlencoded'
}
# obtained these values from viewing the source of https://searchlight.cluen.com/E5/CandidateSearch.aspx
viewstate = '/wEPDwULLTE3NTc4MzQwNDIPZBYCAg ... uJRWDs/6Ks1FECco='
eventvalidation = '/wEWjQMC8pat6g4C77jgxg0CzoqI8wgC3uWinQQCwr/ ... oPKYVeb74='
url = 'https://searchlight.cluen.com/E5/CandidateSearch.aspx'
formData = (
('__VIEWSTATE', viewstate),
('__EVENTVALIDATION', eventvalidation),
('__EVENTTARGET',''),
('__EVENTARGUMENT',''),
('textcity',''),
('dropdownlistposition',''),
('dropdownlistdepartment',''),
('dropdownlistorderby',''),
('textsearch',''),
)
# change user agent
from urllib import FancyURLopener
class MyOpener(FancyURLopener):
version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'
myopener = MyOpener()
# encode form data in post-request format
encodedFields = urllib.urlencode(formData)
f = myopener.open(url, encodedFields)
print f.info()
try:
fout = open('tmp.htm', 'w')
except:
print('Could not open output file\n')
fout.writelines(f.readlines())
fout.close()
</code></pre>
<p>关于这个主题,有几个问题很有帮助(比如<a href="https://stackoverflow.com/questions/1480356/how-to-submit-query-to-aspx-page-in-python">how to submit query to .aspx page in python</a>),但是我还是坚持这个问题,如果可能的话,我会要求更多的帮助。在</p>
<p>生成的html页面显示我可能需要登录,但是aspx页面显示在我的浏览器中,没有任何登录。在</p>
<p>以下是info()的结果:</p>
<blockquote>
<p>Connection: close
Date: Tue, 07 Jun 2011 17:05:26 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Cache-Control: private
Content-Type: text/html; charset=utf-8
Content-Length: 1944</p>
</blockquote>