绕过“重要信息页”时进行网络绘图

2024-06-26 07:47:21 发布

您现在位置:Python中文网/ 问答频道 /正文

我想使用pythonR提取以下链接的基金价格:

http://www.mpf.invesco.com.hk/html/en/mpf/prices.html

但每次我在brower中加载页面时,它都会将我重定向到下面的页面,以确认我已经阅读了重要信息,然后才能获得基金价格的访问权限。你知道吗

http://www.mpf.invesco.com.hk/html/en/mpf/information.html

我猜“重要信息页”是由javascript制作的。我可以用Rpython来确认它已经阅读了重要信息,并让它检索下一页的基金价格吗?你知道吗


Tags: com信息http基金链接htmlwww价格
2条回答

使用RSeleniumphantomjs

# use dev version so we can run phantomjs without a selenium server
# devtools::install_github("ropensci/RSelenium")
# it is necessary that phantomjs is in your PATH if not
# refer to package vignettes

library(RSelenium)
appURL <- "http://www.mpf.invesco.com.hk/html/en/mpf/prices.html"
pJS <- phantom()
remDr <- remoteDriver(browserName = "phantomjs")
remDr$open()
remDr$navigate(appURL)
# <span onclick=\"accept();return false;\">I have read the Important Information</span>
# execute above code 
remDr$executeScript("accept();return false;")
# switch to iframe element
remDr$switchToFrame("myFrame")

> head(readHTMLTable(remDr$getPageSource()[[1]]
                     , which = 2, header = TRUE, skip.rows = 1))

Name of Constituent Fund Unit Class Currency
1                                                 Hong Kong and China Equity Fund          A      HKD
2                                                               Asian Equity Fund          A      HKD
3                                                                     Growth Fund          A      HKD
4                                                                   Balanced Fund          A      HKD
5 RMB Bond Fund (this Constituent Fund is denominated in HKD only and not in RMB)          A      HKD
6                                                             Capital Stable Fund          A      HKD
Fund Price
1    34.5537
2    10.2323
3    19.2199
4    18.8244
5     9.8299
6    18.3871

完成后,最后关闭phantomjs实例:

pJS$stop()

情况简单一点。您需要的表“坐”在从this url加载的iframe中。你知道吗

下面是如何使用^{}获得它并使用^{}进行解析:

from bs4 import BeautifulSoup
import requests

URL = 'https://apps.ap.invesco.com/invee/fund_info/fund_price_ns_mpf.do?version=en&haaccount=N&url=http://www.mpf.invesco.com.hk/html/pdf/factsheets/mpf'
response = requests.get(URL)

soup = BeautifulSoup(response.content)
table = soup.find_all('table')[1]

# getting the first row for example
print table.tr.text.strip()

印刷品:

Valuation Date: 10/07/2014

仅供参考,这里不需要selenium和真正的浏览器。你知道吗

相关问题 更多 >