BeautifulSoup:无法访问TD内的信息

def main(): r = requests.get('https://modules.ussquash.com/ssm/pages/leagues/League_Information.asp?leagueid=1859') data = r.text soup = BeautifulSoup(data) table = soup.find_all('table')[1] rows = table.find_all('tr')[1:] for row in rows: cols = row.find_all('td') print(cols)

1条回答

网友

1楼 · 发布于 2024-09-29 01:19:40

前两个tr在thead中，没有td标记，您想跳过前两个tr：

rows = table.find_all('tr')[2:]

为了得到你想要的，我们可以简化css选择器的使用：

table = soup.find_all('table', limit=2)[1]

# skip first two tr's
rows = table.select("tr + tr + tr")
for row in rows:
    # anchor we want is inside the first td
    a = row.select_one("td a") # or  a = row.find("td").a
    print(a.text,a["href"])

而且href是一个相对路径，因此您需要将其连接到基本url：

import requests
from bs4 import BeautifulSoup
from urllib.urlparse import  urljoin

def main():
    base = "https://modules.ussquash.com/ssm/pages/leagues/"
    r = requests.get('https://modules.ussquash.com/ssm/pages/leagues/League_Information.asp?leagueid=1859')
    data = r.text
    soup = BeautifulSoup(data)

    table = soup.find_all('table', limit=2)[1]
    # skip first two tr's
    rows = table.select("tr + tr + tr")

    for row in rows:
        a = row.select_one("td a")
        print(a.text, urljoin(base, a["href"]))

相关问题更多 >

编程相关推荐

热门问题

热门文章

BeautifulSoup:无法访问TD内的信息

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >