<p>AJAX调用由javascript执行,mechanize没有办法运行javascript。Mechanize只查看静态HTML页面上的表单字段,并允许您填写和提交这些字段。这就是为什么你的研究将你指向像Selenium或<a href="http://jeanphix.me/Ghost.py/" rel="nofollow noreferrer">Ghost</a>这样的东西,它们运行在可以执行javascript的真正浏览器之上。在</p>
<p>不过,有一个更简单的方法来做到这一点!如果您在浏览器上使用开发人员工具(例如Firefox或Chrome中的“网络”选项卡)并填写表格,您可以看到您的浏览器在后台发出的请求,即使使用AJAX:</p>
<p><a href="https://i.stack.imgur.com/MT2CS.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/MT2CS.png" alt="Network tab in Firefox"/></a></p>
<p>这告诉你:</p>
<ul>
<li>浏览器发出了<code>POST</code>请求</li>
<li>指向此URL:<code>https://interactive.web.insurance.ca.gov/survey/survey?type=homeownerSurvey&event=HOMEOWNERS</code></li>
<li>具有以下形式参数:
<ul>
<li>位置=ALAMEDA+ALAMEDA</li>
<li>coverageType=房主</li>
<li>覆盖量=150000</li>
<li>homeAge=新的</li>
</ul></li>
</ul>
<p>您可以使用以下信息在Python中发出相同的POST请求:</p>
<pre><code>import urllib.parse, urllib.request
url = "https://interactive.web.insurance.ca.gov/survey/survey?type=homeownerSurvey&event=HOMEOWNERS"
data = urllib.parse.urlencode(dict(
location="ALAMEDA ALAMEDA",
coverageType="HOMEOWNERS",
coverageAmount="150000",
homeAge="New",
))
res = urllib.request.urlopen(URL, data.encode("utf8"))
print(res.read())
</code></pre>
<p>这是Python3。<a href="http://docs.python-requests.org/en/latest/" rel="nofollow noreferrer">requests</a>库为发出HTTP请求提供了更好的API。在</p>
<hr/>
<p><strong>编辑</strong>:针对您的三个问题:</p>
<blockquote>
<p>is it possible for the dictionary that you've created to have more than 1 location and cycle through them using a for loop?</p>
</blockquote>
<p>是的,只需在代码周围添加一个循环,并每次为<code>location</code>传递一个不同的值。我会将此代码放入一个函数中,以使代码更干净,如下所示:</p>
<p><a href="https://gist.github.com/lost-theory/08786e3a27c8d8ce3839" rel="nofollow noreferrer">https://gist.github.com/lost-theory/08786e3a27c8d8ce3839</a></p>
<blockquote>
<p>the results are in a lot of jibberish, so I'd have to find a way to sift through it huh. Like pick out which is which</p>
</blockquote>
<p>是的,jibberish是HTML,您需要解析它来收集您要查找的数据。看看python标准库中的<a href="https://docs.python.org/3/library/html.parser.html" rel="nofollow noreferrer">HTMLParser</a>,或者安装一个像<a href="http://lxml.de/" rel="nofollow noreferrer">lxml</a>或{a7}这样的库,它们有一个更好的API。您也可以尝试使用<code>str.split</code>手动解析文本。在</p>
<p>如果要将表的行转换为python<code>list</code>,则需要查找所有行,如下所示:</p>
^{pr2}$
<p>{{{cd6>在每一行中清除所有元素(cd6}),然后清除这些元素。在</p>
<p>关于StackOverflow有很多问题,还有关于如何用python解析或抓取HTML的教程,比如<a href="http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html#The%20basic%20find%20method:%20findAll(name,%20attrs,%20recursive,%20text,%20limit,%20**kwargs)" rel="nofollow noreferrer">this</a>或{a9}。在</p>
<blockquote>
<p>could you explain why we had to do the data.encode line</p>
</blockquote>
<p>当然!在<a href="https://docs.python.org/3/library/urllib.request.html#urllib.request.urlopen" rel="nofollow noreferrer">documentation for ^{<cd9>}</a>中,它说:</p>
<blockquote>
<p>data must be a bytes object specifying additional data to be sent to the server, or None if no such data is needed.</p>
</blockquote>
<p><code>urlencode</code>函数返回一个unicode字符串,如果我们试图将其传递到<code>urlopen</code>,则会出现以下错误:</p>
^{3}$
<p>所以我们使用<code>data.encode('utf8')</code>将unicode字符串转换为字节。您通常需要使用字节进行输入和输出,如读取或写入磁盘上的文件,通过网络发送或接收数据,如HTTP请求等。<a href="http://nedbatchelder.com/text/unipain.html" rel="nofollow noreferrer">This presentation</a>对python中的字节与unicode字符串有很好的解释,以及为什么在执行I/O时需要解码/编码</p>