如何使用python从交互式图表中提取数据？

2条回答

网友

1楼 · 编辑于 2024-09-30 05:17:08

from bs4 import BeautifulSoup

import requests

import re

#First get all the text from the url.

url=”https://index.minfin.com.ua/ua/economy/index/svg.php?indType=1&；fromYear=2010和acc=1“

response = requests.get(url)

html = response.text

#Find all the tags in which the data is stored.

soup = BeautifulSoup(html, 'lxml')

texts = soup.findAll("rect")

final  = []

for each in texts: 

    names = each.get('onmouseover')
    try:
        q = re.findall(r"'(.*?)'", names)
        final.append(q[0])
    except Exception as e:
        print(e)

#The details are appended to the final variable

网友

2楼 · 编辑于 2024-09-30 05:17:08

您将按如下方式加载HTML：

import requests

url = "https://index.minfin.com.ua/ua/economy/index/svg.php?indType=1&fromYear=2010&acc=1"
resp = requests.get(url)
data = resp.text

然后，您将使用此HTML创建BeatifulSoup对象

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, features="html.parser")

在此之后，如何解析出您想要的内容通常是非常主观的。候选代码可能变化很大。我就是这样做的：

使用BeautifulSoup，我解析了所有的“rect”，并检查该rect中是否存在“onmouseover”

rects = soup.svg.find_all("rect")
yx_points = []
for rect in rects:
    if rect.has_attr("onmouseover"):
        text = rect["onmouseover"]
        x_start_index = text.index("'") + 1
        y_finish_index = text[x_start_index:].index("'") + x_start_index
        yx = text[x_start_index:y_finish_index].split()
        print(text[x_start_index:y_finish_index])
        yx_points.append(yx)

从下图中可以看到，我刮取了onmouseover=部分，得到了那些02.2015 155,1部分

下面是yx_points现在的样子：

[['12.2009', '100,0'], ['01.2010', '101,8'], ['02.2010', '103,7'], ...]

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用python从交互式图表中提取数据？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >