Python抓取创建有效负载cnmv.es并呈现javascript

2024-09-29 21:42:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我发送带有有效负载和搜索文本aaa的请求https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx,但得到javascript响应。所以我需要渲染javascript,但我不想使用Selenium。我也不确定我的有效载荷是否良好

    url = 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx'
    search_text = 'aaa'
    r = requests.get('https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx')
    soup = BeautifulSoup(r.content, 'html.parser')

    VIEWSTATE  = soup.find(id="__VIEWSTATE")['value'] + '%3D&'
    VIEWSTATEGENERATOR = '__VIEWSTATEGENERATOR=' + soup.find(id="__VIEWSTATEGENERATOR")['value']

    EVENTVALIDATION = '&__EVENTVALIDATION' + soup.find(id="__EVENTVALIDATION")['value']
    SEARCH = "&ctl00%24wBusqueda%24txtBusqueda=&ctl00%24ContentPrincipal%24txtBusqueda={0}&ctl00%24ContentPrincipal%24btnBuscar=Search".format(search_text)
    

    payload = '__EVENTTARGET=&__EVENTARGUMENT=&__VIEWSTATE=' + VIEWSTATE + VIEWSTATEGENERATOR + EVENTVALIDATION + SEARCH


    headers = {
    'Connection': 'keep-alive',
    'Cache-Control': 'max-age=0',
    'Upgrade-Insecure-Requests': '1',
    'Origin': 'https://www.cnmv.es',
    'Content-Type': 'application/x-www-form-urlencoded',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Sec-Fetch-Site': 'same-origin',
    'Sec-Fetch-Mode': 'navigate',
    'Sec-Fetch-User': '?1',
    'Sec-Fetch-Dest': 'document',
    'Referer': 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx',
    'Accept-Language': 'en-US,en;q=0.9',
    }

    response = requests.request("POST", url, headers=headers, data = payload)

    print(response.text.encode('utf8'))

Tags: texthttpsesapplicationwwwsecportalsoup
1条回答
网友
1楼 · 发布于 2024-09-29 21:42:14

我没有测试你的payload,但我不知道你为什么要将%3D添加到__VIEWSTATE

我使用的字典requests将自动转换为字符串,而不必手动添加&。我不必在{}中使用{}而不是{},等等

payload = {
    '__EVENTTARGET': '',
    '__EVENTARGUMENT': '',
    '__VIEWSTATE': soup.find(id="__VIEWSTATE")['value'],
    '__VIEWSTATEGENERATOR': soup.find(id="__VIEWSTATEGENERATOR")['value'],
    '__EVENTVALIDATION': soup.find(id="__EVENTVALIDATION")['value'],
    'ctl00$wBusqueda$txtBusqueda': '',
    'ctl00$ContentPrincipal$txtBusqueda': search_text,
    'ctl00$ContentPrincipal$btnBuscar': 'Buscar',
}        

顺便说一句:代码在没有标题的情况下对我有效,但我会对它们进行注释

import requests
from bs4 import BeautifulSoup

url = 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx'
search_text = 'aaa'

r = requests.get('https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx')
soup = BeautifulSoup(r.content, 'html.parser')

payload = {
    '__EVENTTARGET': '',
    '__EVENTARGUMENT': '',
    '__VIEWSTATE': soup.find(id="__VIEWSTATE")['value'],
    '__VIEWSTATEGENERATOR': soup.find(id="__VIEWSTATEGENERATOR")['value'],
    '__EVENTVALIDATION': soup.find(id="__EVENTVALIDATION")['value'],
    'ctl00$wBusqueda$txtBusqueda': '',
    'ctl00$ContentPrincipal$txtBusqueda': search_text,
    'ctl00$ContentPrincipal$btnBuscar': 'Buscar',
}        

headers = {
#    'Connection': 'keep-alive',
#    'Cache-Control': 'max-age=0',
#    'Upgrade-Insecure-Requests': '1',
#    'Origin': 'https://www.cnmv.es',
#    'Content-Type': 'application/x-www-form-urlencoded',
#    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.105 Safari/537.36',
#    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
#    'Sec-Fetch-Site': 'same-origin',
#    'Sec-Fetch-Mode': 'navigate',
#    'Sec-Fetch-User': '?1',
#    'Sec-Fetch-Dest': 'document',
#    'Referer': 'https://www.cnmv.es/portal/Consultas/BusquedaPorEntidad.aspx',
#    'Accept-Language': 'en-US,en;q=0.9',
}

r = requests.post(url, headers=headers, data=payload)
#print(response.text)

soup = BeautifulSoup(r.content, 'html.parser')
for item in soup.find_all('option'):
    print(item['value'], '|', item.text)

结果:

CLP3846 | AAA TRADE LTD
V85543155 | DWS DINERO GOBIERNOS AAA, FI
V85263911 | EUROVALOR DEUDA PUBLICA EUROPEA AAA, FI
9686 | WWW.AAARATEDBOND.COM

相关问题 更多 >

    热门问题