用漂亮的汤刮yt格式的字符串

import requests import bs4 from bs4 import BeautifulSoup r = requests.get('https://www.youtube.com/channel/UCPyMcv4yIDfETZXoJms1XFA') soup = bs4.BeautifulSoup(r.text, "html.parser") def onoroff(): onoroff = soup.find('yt-formatted-string',{'id','subscriber-count'}).text return onoroff print("Subscribers: "+str(onoroff().strip()))

1条回答

网友

1楼 · 发布于 2024-09-28 05:21:06

大部分Youtube内容都是通过JavaScript生成的，这是BeautifulSoup所不具备的功能，但您可以通过在源代码中删除json对象，而不是直接删除HTML元素来获得运气，即：

import requests, json, re

h = {
    'Host': 'www.youtube.com',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:75.0) Gecko/20100101 Firefox/75.0',
    'Accept': '*/*',
    'Accept-Language': 'en-US,pt;q=0.7,en;q=0.3',
    'Referer': 'https://www.youtube.com/channel/UCPyMcv4yIDfETZXoJms1XFA',
}
u = "https://www.youtube.com/channel/UCPyMcv4yIDfETZXoJms1XFA"
html = requests.get(u, headers=h).text

# lets get the json object that contains all the info we need from the source code and convert it into a python dict that we can use later
matches = re.findall(r'window\["ytInitialData"\] = (.*\}\]\}\}\});', html, re.IGNORECASE | re.DOTALL)
if matches:
    j = json.loads(matches[0])
    # browse the json object and search the info you need : https://jsoneditoronline.org/#left=cloud.123ad9bb8bbe498c95f291c32962aad2
    # We are now ready to get the the number of subscribers (among other info):

    subscribers = j['header']['c4TabbedHeaderRenderer']['subscriberCountText']['runs'][0]["text"]
    print(subscribers)
    # 110 subscribers

Demo

相关问题更多 >

编程相关推荐

热门问题

热门文章

用漂亮的汤刮yt格式的字符串

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >