用BeautifulSoup4解析数据

2024-06-01 07:26:05 发布

您现在位置:Python中文网/ 问答频道 /正文

import requests
from bs4 import BeautifulSoup

request = requests.get("http://www.lolesports.com/en_US/worlds/world_championship_2016/standings/default")
content = request.content
soup = BeautifulSoup(content, "html.parser")
team_name = soup.findAll('text', {'class': 'team-name'})

print(team_name)

我正在尝试从url“http://www.lolesports.com/en_US/worlds/world_championship_2016/standings/default”解析数据。在<text class="team-name">SK Telecom T1</text>下面是各个团队的名称。我试图做的是解析数据(SK Telecom T1)并将其打印到屏幕上,但我得到的却是一个空列表。我做错什么了?你知道吗


Tags: textnameimportcomhttprequestwwwcontent
2条回答

您不需要selenium,所有的动态内容都可以通过简单的get请求以json格式检索到http://api.lolesports.com/api/v1/leagues

import requests

data = requests.get("http://api.lolesports.com/api/v1/leagues?slug=worlds").json()

这给了你大量的数据,你想要的似乎都在data["teams"]之下。其中的一个片段是:

[{'id': 2, 'slug': 'bangkok-titans', 'name': 'Bangkok Titans', 'teamPhotoUrl': 'http://na.lolesports.com/sites/default/files/BKT_GPL.TMPROFILE_0.png', 'logoUrl': 'http://assets.lolesports.com/team/bangkok-titans-597g0x1v.png', 'acronym': 'BKT', 'homeLeague': 'urn:rg:lolesports:global:league:league:12', 'altLogoUrl': None, 'createdAt': '2014-07-17T18:34:47.000Z', 'updatedAt': '2015-09-29T16:09:36.000Z', 'bios': {'en_US': 'The Bangkok Titans are the undisputed champions of Thailand’s League of Legends esports scene. They achieved six consecutive 1st place finishes in the Thailand Pro League from 2014 to 2015. However, they aren’t content with just domestic domination.

如果有以下情况,每个团队都会列在列表中:

In [1]: import requests


In [2]: data = requests.get("http://api.lolesports.com/api/v1/leagues?slug=worlds").json()


In [3]: for d in data["teams"]:
   ...:         print(d["name"])
   ...:     
Bangkok Titans
ahq e-Sports Club
SK Telecom T1
TSM
Fnatic
Cloud9 
Counter Logic Gaming
H2K
Edward Gaming
INTZ e-Sports
paiN Gaming
Origen
LGD Gaming
Invictus Gaming
Royal Never Give Up
Flash Wolves
Splyce
Samsung Galaxy
KT Rolster
ROX Tigers
G2 Esports
I May
Albus NoX Luna

该网站依赖于javascript加载。请求不解释JS,因此无法解析数据。你知道吗

像这样的网站,你会更好地与硒。它使用Firefox(或其他驱动程序)作为整个网站(包括JS)的解释器。你知道吗

相关问题 更多 >