如何从网络上获得一定的价值

2024-10-01 00:35:11 发布

您现在位置:Python中文网/ 问答频道 /正文

所以,我们有一个网站,像pewdiepies YT主页,https://www.youtube.com/channel/UC-lHJZR3Gqxm24_Vd_AJ5Yw。我想写一个剧本,让我知道他的底数。我要用漂亮的汤吗?你知道吗

我知道,它保存在

yt格式字符串 id="subscriber-count" class="style-scope ytd-c4-tabbed-header-renderer">84,831,541 subscribers/yt-formatted-string>

我与网络开发无关,所以这对我来说是一堆胡言乱语。但一定有办法让我得到这个价值,没有靓汤,就一定有办法吗?你知道吗

import urllib.request
import json
import webbrowser

data = urllib.request.urlopen('https://www.youtube.com/channel/UC-lHJZR3Gqxm24_Vd_AJ5Yw')
print(data)

到目前为止我只有这些。你知道吗


Tags: httpsimportcomdatayoutuberequestwwwchannel
2条回答

从它看起来像你试图做的是得到一个指定通道的子计数。为此,我将使用googleyoutubeapi,因为它比web抓取更快、更可靠。下面是示例代码。你知道吗

1)获取API密钥并启用此库

https://console.developers.google.com/apis/library/youtube.googleapis.com

2)获取Youtube频道的频道id,例如PewDiePie isUC-lHJZR3Gqxm24\u Vd\u AJ5Yw

https://www.youtube.com/channel/<channel_id>

3)使用指定的参数向下面的URL发出GET请求

https://www.googleapis.com/youtube/v3/channels?part=statistics&id={CHANNEL_ID}&key={YOUR_API_KEY}

3b)这将返回您需要解析的JSON响应

{
 "kind": "youtube#channelListResponse",
 "etag": "\"XpPGQXPnxQJhLgs6enD_n8JR4Qk/MlIT59Jru-h7AvGc09RB7HQI6qA\"",
 "pageInfo": {
  "totalResults": 1,
  "resultsPerPage": 1
 },
 "items": [
  {
   "kind": "youtube#channel",
   "etag": "\"XpPGQXPnxQJhLgs6enD_n8JR4Qk/a5p-d8soZS1kVL3A3QlzHsJFa44\"",
   "id": "UC-lHJZR3Gqxm24_Vd_AJ5Yw",
   "statistics": {
    "viewCount": "20374094982",
    "commentCount": "0",
    "subscriberCount": "84859110",
    "hiddenSubscriberCount": false,
    "videoCount": "3744"
   }
  }
 ]
}

获取pewdiepies通道子计数的示例代码

import requests

url = 'https://www.googleapis.com/youtube/v3/channels?part=statistics&id=<channel_id>&key=<your_api_key>'

resp = requests.get(url=url)
data = resp.json()

sub_count = data['items'][0]['statistics']['subscriberCount']

print(sub_count)

你所做的是网页抓取。一个快速的谷歌搜索澄清了如何处理这个问题。你要找的代码

import requests
from lxml import html

# Retrieve the web page
data = requests.get('https://www.youtube.com/channel/UC-lHJZR3Gqxm24_Vd_AJ5Yw')

# Parse the HTML
tree = html.fromstring(data.content)

# Find the subscriber count in the HTML tree
subscriber_count = tree.xpath('//*[contains(@class,"yt-subscription-button-subscriber-count-branded-horizontal")]/text()')[0]

# Convert to integer
subscriber_count = int(subscriber_count.replace(",",""))

print(subscriber_count)

写作时的结果是:“84851474”

如果您想了解更多,可以深入研究web scraping in PythonXPath。你知道吗

相关问题 更多 >