Python web scrape登录

2024-10-01 15:47:09 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python新手,尝试使用xpath和请求登录here,并使用this tutorial中演示的方法从here中获取一些数据。我的python脚本当前如下:

from lxml import html
import requests

url = "http://www.londoncoffeeguide.com/Venues/Profile/26-Grains"

session_requests = requests.session()
login_url = "http://www.londoncoffeeguide.com/signin?returnurl=%2fVenues"
result = session_requests.get(login_url)

tree = html.fromstring(result.content)
authenticity_token = list(set(tree.xpath("//input[@name='__CMSCsrfToken']/@value")))[0]

payload = {
    "p$lt$ctl01$LogonForm_SignIn$Login1$UserName": 'XXX', 
    "p$lt$ctl01$LogonForm_SignIn$Login1$Password": 'XXX', 
    "__CMSCsrfToken": authenticity_token
}

headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0'}

with requests.session() as s:
    p = s.post(login_url, data=payload, headers=headers)
    print(p.text)

不幸的是,post请求返回的文本显示。。。你知道吗

<head><title>
    System error
</title>

…然后是用于登录页的HTML的其余部分。我试着添加上面显示的headers行,仔细检查我使用的登录详细信息是否正确,我很高兴CMSCsrfToken是正确的,但是登录不起作用。对此的任何帮助都是非常感谢的,我一直在谷歌搜索,但我发现对类似问题的各种回答似乎都没有帮助(到目前为止!)你知道吗


Tags: importcomhttpurlheresessionhtmlwww
1条回答
网友
1楼 · 发布于 2024-10-01 15:47:09

您将usernamepassword放错了字段。此外,在有效负载中要添加的额外字段很少,如viewstategeneratorviewstatee.t.c.中,这样脚本才能工作。下面的脚本将让您登录,然后获取不同的配置文件项标题。你知道吗

from lxml.html import fromstring
import requests

login_url = "http://www.londoncoffeeguide.com/signin?returnurl=%2fVenues"

username = "" #fill this in
password = "" #fill this in as well

with requests.session() as session:
    session.headers['User-Agent'] = 'Mozilla/5.0'
    result = session.get(login_url)
    tree = fromstring(result.text)
    auth_token = tree.xpath("//input[@id='__CMSCsrfToken']/@value")[0]
    viewstate = tree.xpath("//input[@id='__VIEWSTATE']/@value")[0]
    viewgen = tree.xpath("//input[@id='__VIEWSTATEGENERATOR']/@value")[0]

    payload = {
        "__CMSCsrfToken": auth_token,
        "__VIEWSTATEGENERATOR":viewgen,
        "p$lt$ctl02$pageplaceholder$p$lt$ctl00$RowLayout_Bootstrap$RowLayout_Bootstrap_2$ColumnLayout_Bootstrap1$ColumnLayout_Bootstrap1_1$LogonForm_SignIn$Login1$UserName": username, 
        "p$lt$ctl02$pageplaceholder$p$lt$ctl00$RowLayout_Bootstrap$RowLayout_Bootstrap_2$ColumnLayout_Bootstrap1$ColumnLayout_Bootstrap1_1$LogonForm_SignIn$Login1$Password": password, 
        "__VIEWSTATE":viewstate,
        "p$lt$ctl02$pageplaceholder$p$lt$ctl00$RowLayout_Bootstrap$RowLayout_Bootstrap_2$ColumnLayout_Bootstrap1$ColumnLayout_Bootstrap1_1$LogonForm_SignIn$Login1$LoginButton": "Log on"
    }

    session.headers.update({'User-Agent': 'Mozilla/5.0'})
    p = session.post(login_url, data=payload)
    root = fromstring(p.text)
    for iteminfo in root.cssselect(".ProfileItem .ProfileItemTitle"):
        print(iteminfo.text)

请确保在执行之前填写脚本中的usernamepassword字段。你知道吗

相关问题 更多 >

    热门问题