从HTML页面和JavaScrip抓取数据

2024-05-17 05:27:43 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从包含JavaScript的HTML页面中获取数据。我读过几篇建议使用Selenium或PyQt4.QtWebKit的帖子,但可能我开始的步骤不对,我使用了requests。在

我可以用PyExecJS或Pyv8这样的外部库来执行从响应中存储的JavaScript,还是应该向后移动并用Selenium编写代码?在

代码如下:

import requests
from bs4 import BeautifulSoup

data = {"redirect_url": "",
    "site": "uk",
    "login_username": "foo",
    "login_password": "bar"}

with requests.Session() as s:
    log = "https://secure.advfn.com/login/secure"
    r = s.get("http://uk.advfn.com/")
    soup = BeautifulSoup(r.content)
    redirect_url = soup.select_one("#redirect_url")["value"]
    site =  soup.select_one("#site")["value"]
    data["redirect_url"] = redirect_url
    p = s.post(log, data=data)
    print(p.content)
    output=s.get('https://it.advfn.com/mercati/BIT/generaliG/ordini').content

这是我得到的HTML输出(请参阅http://pastebin.com/bwa1hWsv的整页):

^{pr2}$

Tags: 代码importcomurldatahtmlseleniumsite