如何刮网站,在那里我不能阻止登录

2024-06-28 15:59:49 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图刮网站“https://laboral.pjud.cl/SITLAPORWEB/InicioAplicacionPortal.do”,但每次我都得到了错误的同一页。我想问题是我必须先在这个网站上进行身份验证。
我尝试创建session对象并发送post请求,但似乎什么都没有改变

import requests    
from bs4 import BeautifulSoup    
from requests.auth import HTTPBasicAuth    

username = 'user'    
password = 'pass'    
scrape_url = 'https://laboral.pjud.cl/SITLAPORWEB/InicioAplicacionPortal.do'    
login_url = 'https://laboral.pjud.cl/SITLAPORWEB/jsp/LoginPortal/LoginPortal.jsp'    
r = requests.get(login_url, auth=HTTPBasicAuth(username, password))    
print(r.text)    
>>>
   <form name="InicioAplicacionForm" method="POST" 
   action="/SITLAPORWEB/InicioAplicacionPortal.do"><INPUT 
   type="hidden" name="FLG_Autoconsulta" value="1"><input 
   type="hidden" name="D0E0F02E" 
   value="764C8AA111F42E621BC10BA16CD8D8B2">
   </form><script>document.InicioAplicacionForm.submit();</script>

login_info = {'username': username,'password': password, "D0E0F02E":"764C8AA111F42E621BC10BA16CD8D8B2"}    
session = requests.session()    
session.post(url=login_url, data=login_info)    
url = session.get(url=scrape_url)    
soup = BeautifulSoup(url.content, 'html.parser')    
print(soup)

Tags: namehttpsimporturlclsessionusernamelogin