用Python从HTML文件获取表

game_link = "http://espn.go.com/nba/playbyplay?gameId=400579510&period=0" game_source = urlopen(game_link) game_html = game_source.read() game_source.close(); row = BeautifulSoup(game_html, "html.parser") pieces = list(row.children)

1条回答

网友

1楼 · 发布于 2024-09-28 17:24:50

您可以尝试BeautifulSoup.findAll并提供标签以及您可能知道的有关您要查找的标签的任何其他属性。在查看页面之后，看起来您正在查找所有带有类even的<tr>标记。所以你可以用soup.findAll("tr", attrs = {"class": "even"})。例如

import urllib.request
from bs4 import BeautifulSoup

game_link = "http://espn.go.com/nba/playbyplay?gameId=400579510&period=0"
game_source = urllib.request.urlopen(game_link)
game_html = game_source.read()
game_source.close();
soup = BeautifulSoup(game_html, "html.parser")
# find all instances of a row with class "even"
rows = soup.findAll("tr", attrs = {"class": "even"})
for row in rows:
    // do work
    print(row)

您仍然需要解析每一行的html。下面是一个非常“粗糙”的例子

def parse_row(row):
    cols = row.findAll("td") # get each column in the row
    # ignore timeouts, this is just an example
    if len(cols) < 4:
        return None
    else:
        return {
                "time": cols[0].get_text(),
                "team1": cols[1].get_text(),
                "score": cols[2].get_text(),
                "team2": cols[3].get_text()
               }

parsed_rows = []
for row in rows:
    parsed = parse_row(row)
    if parsed:
        parsed_rows.append(parsed)

相关问题更多 >

编程相关推荐

热门问题

热门文章

用Python从HTML文件获取表

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >