我试图用Python从页面上抓取一些元素及其值;但是,要获得更多元素,我需要模拟单击下一步按钮。这些纽扣上绑着一个贴纸,所以我想打电话给它。不幸的是,Python只是反复地打印相同的值[这意味着不调用next按钮的post-back]。我正在使用requests
来完成我的POST/GET。在
import re
import time
import requests
TARGET_GROUP_ID = 778092
SESSION = requests.Session()
REQUEST_HEADERS = {"Accept-Encoding": "gzip,deflate"}
GROUP_URL = "http://roblox.com/groups/group.aspx?gid=%d"%(TARGET_GROUP_ID)
POST_BUTTON_HTML = 'pagerbtns next'
EVENTVALIDATION_REGEX = re.compile(r'id="__EVENTVALIDATION" value="(.+)"').search
VIEWSTATE_REGEX = re.compile(r'id="__VIEWSTATE" value="(.+)"').search
VIEWSTATEGENERATOR_REGEX = re.compile(r'id="__VIEWSTATEGENERATOR" value="(.+)"').search
TITLE_REGEX = re.compile(r'<a id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_ctrl\d+_hlAvatar".*?title="(\w+)".*?ID=(\d+)"')
page = SESSION.get(GROUP_URL, headers = REQUEST_HEADERS).text
while 1:
if POST_BUTTON_HTML in page:
for (ids,names) in re.findall(TITLE_REGEX, page):
print ids,names
postData = {
"__EVENTVALIDATION": EVENTVALIDATION_REGEX(page).group(1),
"__VIEWSTATE": VIEWSTATE_REGEX(page).group(1),
"__VIEWSTATEGENERATOR": VIEWSTATEGENERATOR_REGEX(page).group(1),
"__ASYNCPOST": True,
"ct1000_cphRoblox_rbxGroupRoleSetMembersPane_currentRoleSetID": "4725789",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl02$ctl00": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton": "",
"ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox": "3"
}
page=SESSION.post(GROUP_URL, data = postData, stream = True).text
time.sleep(2)
我怎样才能正确地把邮局叫回来ASP.NET从Python来解决这个问题?如前所述,每次只打印相同的值。在
这是按钮的HTML元素
^{pr2}$这是它所在的分区:
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_MembersPagerPanel" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton')">
<div id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_Div1" class="paging_wrapper">
Page <input name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$PageTextBox" type="text" value="1" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_PageTextBox" class="paging_input"> of
<div class="paging_pagenums_container">125</div>
<input type="submit" name="ctl00$cphRoblox$rbxGroupRoleSetMembersPane$dlUsers_Footer$ctl01$HiddenInputButton" value="" onclick="loading('members');" id="ctl00_cphRoblox_rbxGroupRoleSetMembersPane_dlUsers_Footer_ctl01_HiddenInputButton" class="pagerbtns translate" style="display:none;">
</div>
</div>
我在考虑使用JS库并执行JS__postback
方法,但是,我想先看看这是否可以用纯Python实现。在
是的,你只要在正确的字段上提交正确的值就可以了。但我假设你正在尝试解析的网页asp.netweb表单,因此查找值等应该非常耗时。我建议您研究一下selenium,这样您就可以轻松地调用网页上的单击和事件,而无需编写太多代码。在
相关问题 更多 >
编程相关推荐