无需令牌即可获取受密码保护的网站

from bs4 import BeautifulSoup import requests from lxml import html import urllib.request import re username = 'myusername' password = 'mypass' url = "http://fantasy.trashtalk.co/?tpl=classement&t=1" log = "http://fantasy.trashtalk.co/login.php" values = {'email': username, 'password': password} r = requests.post(log, data=values) # Not sure about the code below but it works. data = r.text soup = BeautifulSoup(data, 'lxml') tags = soup.find_all('a') for link in soup.findAll('a', attrs={'href': re.compile("^https://")}): print(link.get('href'))

1条回答

网友

1楼 · 发布于 2024-09-30 06:20:49

问题是，您需要通过会话对象（而不是请求对象）来保存登录凭据。我已经修改了下面的代码，您现在可以访问位于scrape_url页面中的html标记。祝你好运！在

import requests
from bs4 import BeautifulSoup

username = 'email'
password = 'password'
scrape_url = 'http://fantasy.trashtalk.co/?tpl=classement&t=1'

login_url = 'http://fantasy.trashtalk.co/login.php'
login_info = {'email': username,'password': password}

#Start session.
session = requests.session()

#Login using your authentication information.
session.post(url=login_url, data=login_info)

#Request page you want to scrape.
url = session.get(url=scrape_url)

soup = BeautifulSoup(url.content, 'html.parser')

for link in soup.findAll('a'):
    print('\nLink href: ' + link['href'])
    print('Link text: ' + link.text)

相关问题更多 >

编程相关推荐

热门问题

热门文章