在Python.aspx网站上为POST方法构建数据

2024-10-03 06:21:07 发布

您现在位置:Python中文网/ 问答频道 /正文

我对.NET和Python还不熟悉,但我想制作一个程序来清理.aspx站点并处理其中的内容(HTML代码就足够了)。我用Python尝试了一些库,但我得到的只是该站点的第一页。似乎我建立了错误的数据后,我不知道正确的形式的数据,什么应该包括和什么不

http://nastenka.lesy.sk/EZOZV/Publish/ObjednavkyZverejnenie.aspx?YR=2018

import requests, urllib, urllib2

r = requests.get("http://nastenka.lesy.sk/EZOZV/Publish/ObjednavkyZverejnenie.aspx?YR=2018")
content = r.text
print content

start_index = content.find('id="__VIEWSTATE"') + 24
sliced_vs = content[start_index:content.find('"',start_index)]

start_index = content.find('id="__VIEWSTATEGENERATOR"') + 33
sliced_vsg = content[start_index:content.find('"',start_index)]

start_index = content.find('id="__VIEWSTATEENCRYPTED"') + 33
sliced_vse = content[start_index:content.find('"',start_index)]

start_index = content.find('id="__EVENTVALIDATION"') + 30
sliced_EV = content[start_index:content.find('"',start_index)]

form_data = {'__EVENTTARGET': 'gvZverejnenie',
      '__EVENTARGUMENT': 'Page$2',
      '__VIEWSTATE': sliced_vs,
      '__VIEWSTATEGENERATOR': sliced_vsg,
      '__VIEWSTATEENCRYPTED': sliced_vse,
      '__EVENTVALIDATION': sliced_EV}

data_encoded = urllib.urlencode(form_data)


r = requests.post('http://nastenka.lesy.sk/EZOZV/Publish/ObjednavkyZverejnenie.aspx?YR=2018',data=data_encoded)
content = r.text
print content

例如,在代码中,我想得到第二页('page$2')。我总是得到相同的结果,但是ViewState和EventValidation的值不同。请问哪里有问题


Tags: idhttpdataindexcontentfindpublishstart
1条回答
网友
1楼 · 发布于 2024-10-03 06:21:07

这段代码需要^{}^{}来控制googlechrome。结果总共有476页(按照你提供的网址)

from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException

chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument(' headless')

driver = webdriver.Chrome(chrome_options=chrome_options)
driver.get('http://nastenka.lesy.sk/EZOZV/Publish/ObjednavkyZverejnenie.aspx?YR=2018')

with open('page_1.html', 'w') as f:
    f.write(driver.page_source)

page_num = 2
while True:
    try:
        element = driver.find_element_by_link_text(str(page_num))
    except NoSuchElementException:
        elements = driver.find_elements_by_link_text('...')
        if len(elements) == 0:
            break  # less than 11 pages total
        elif len(elements) == 1 and page_num > 12:
            break  # last page
        element = elements[-1]

    element.click()

    with open('page_{}.html'.format(page_num), 'w') as f:
        f.write(driver.page_source)

    page_num += 1

driver.quit()

相关问题 更多 >