我需要从页面中删除一些数据,在那里我填写表单(已经用mechanize完成了这项工作)。问题是,页面返回许多页面上的数据,而我在从这些页面获取数据时遇到了麻烦。在
从第一个结果页面获取它们没有问题,因为它已经在搜索之后显示了-我只需提交表单并获得响应。在
我分析了结果页面的源代码,它似乎使用了Java脚本RichFaces(一些用于JSF和ajax的库,但是我可能错了,因为我不是web专家)。在
但是,我设法找到了如何到达剩下的结果页。我需要单击以下形式的链接(href="javascript:void(0);"
,下面是完整代码):
<td class="pageNumber"><span class="rf-ds " id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233"><span class="rf-ds-nmb-btn rf-ds-act " id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_1">1</span><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_2">2</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_3">3</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_4">4</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_5">5</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_6">6</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_7">7</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_8">8</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_9">9</a><a class="rf-ds-nmb-btn " href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_10">10</a><a class="rf-ds-btn rf-ds-btn-next" href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_next">»</a><a class="rf-ds-btn rf-ds-btn-last" href="javascript:void(0);" id="SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_l">»»»»</a>
<script type="text/javascript">new RichFaces.ui.DataScroller("SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233",function(event,element,data){RichFaces.ajax("SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233",event,{"parameters":{"SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233:page":data.page} ,"incId":"1"} )},{"digitals":{"SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_9":"9","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_8":"8","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_7":"7","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_6":"6","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_5":"5","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_4":"4","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_3":"3","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_1":"1","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_10":"10","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_2":"2"} ,"buttons":{"right":{"SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_next":"next","SomeSimpleForm:SomeSimpleTable:j_idt211:j_idt233_ds_l":"last"} } ,"currentPage":1} )</script></span></td>
<td class="pageExport"><script type="text/javascript" src="/opi/javax.faces.resource/download.js?ln=js/component&b="></script><script type="text/javascript">
所以我想问一下,是否有一种方法可以单击所有链接并使用mechanize获得所有页面(注意,在»
符号之后还有更多的页面可用)?我问一些有网络知识的傻瓜的答案:)
首先,我还是坚持使用selenium,因为这是一个相当“javascript重”的网站。请注意,如果需要,可以使用无头浏览器(^{} 或与virtual display)一起使用。在
这里的想法是按每页100行分页,单击“>>;”链接,直到它不在页面上,这意味着我们已经到达最后一页,没有更多的结果要处理。为了使解决方案可靠,我们需要使用Explicit Waits:每次我们进入下一页-等待加载微调器不可见。在
工作实施:
这对我有用:似乎所有的html都可以在
page
中找到您不必首先收集和存储所有html,还可以只查询您想要的内容,但您需要}。在
lxml
或{编辑:在运行之后,我确实注意到我们出错了。只需捕获异常并重试就可以了。在
相关问题 更多 >
编程相关推荐