Python BeautifulSoup如何从javascript元素中提取var结果?

2024-05-19 11:04:39 发布

您现在位置:Python中文网/ 问答频道 /正文

我是Python新手,我一直在尝试使用BeautifulSoup从脚本元素中定义的变量中提取一个特定的数据行

代码:

import requests
from bs4 import BeautifulSoup
import esprima

#----------------some comment'

URL = 'https://downdetector.com/status/facebook/'

browser = {'user-agent': 'my agent'}


#--------------some comment:
page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


#---------------some comment:

chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()
print(chart)

输出:

var data = {
status: 'success',  
baseline: 29,       
communicate: null,  
company: 'Facebook',
max: 66,
series: [

                      { x: '2020-05-30T13:22:28.168484-04:00', y: 25  },

                      { x: '2020-05-30T13:37:28.168484-04:00', y: 27  },

                      .....

                      { x: '2020-05-31T13:07:28.168484-04:00', y: 30  },

                  ]
                }

                $(function () {
                  chartThis(data, 'holder', 'line')
                });

                if (data.communicate && $('#dd-communicate').length) {
                  $('#dd-communicate').html('<div class="border text-left d-inline-block p-2"><i class="fa" aria-hidden="true" style="color: red; width:16px; height:12px; background:url(https://cdn2.downdetector.com/d328eb8cbe4e164/images/v2/message.svg) no-repeat"></i>'
                    +'<span class="d-inline-block px-1">'+ data.company+' &bull;  ' + moment.utc(data.communicate.created_at).fromNow()
                    + '</span><p class="font-weight-bold my-0">'+ data.communicate.message + '</p></div>')
                }

您知道从上述var结果中提取“max”值的简单方法吗

我尝试过使用esprima,但仍然没有成功,因为我遇到了错误:

Traceback (most recent call last): File "c:/test.py, line 31, in if token["type"] == "Identifier" and token["value"] == "max": TypeError: 'BufferEntry' object is not subscriptable

我与esprima的代码如下所示:

import requests
from bs4 import BeautifulSoup
import esprima

#----------------some comment'

URL = 'https://downdetector.com/status/facebook/'

browser = {'user-agent': 'my agent'}


#--------------some comment:
page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


#---------------some comment:

chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()

tokens = esprima.tokenize(chart)

token_iterator = iter(tokens)

for token in token_iterator:
    if token["type"] == "Identifier" and token["value"] == "max":
        value_token = next(next(token_iterator))
        result = value_token["value"]

任何帮助都将不胜感激


Tags: importbrowsertokenurldatavaluepagecomment
1条回答
网友
1楼 · 发布于 2024-05-19 11:04:39

提取最大值的快速解决方案是在chart上使用split

import requests
from bs4 import BeautifulSoup

URL = 'https://downdetector.com/status/facebook/'
browser = {'user-agent': 'my agent'}

page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()
max_val= chart.split("max: ")[1].split(",")[0]

print(max_val)

OUT: 64

相关问题 更多 >

    热门问题