有没有什么方法可以刮到csk？

from urllib.request import urlopen from bs4 import BeautifulSoup content = urlopen("https://www.baseball-reference.com/leagues/MLB/2018-standard-pitching.shtml") soup = BeautifulSoup(content.read(),"lxml") tags = soup.findAll('div') for t in tags: print(t)

2条回答

网友

1楼 · 编辑于 2024-10-04 05:25:54

使用lxml会更快：

from urllib.request import urlopen
#from bs4 import BeautifulSoup, Comment
from lxml import html

response = urlopen("https://www.baseball-reference.com/leagues/MLB/2018-standard-pitching.shtml")
content = response.read()

tree = html.fromstring( content )

#Now we need to find our target table (comment text)
comment_html = tree.xpath('//comment()[contains(., "players_standard_pitching")]')[0]

#removing HTML comment markup
comment_html = str(comment_html).replace(" >", "")
comment_html = comment_html.replace("<! ", "")

#parsing our target HTML again
tree = html.fromstring( comment_html )

for pitcher_row in tree.xpath('//table[@id="players_standard_pitching"]/tbody/tr[contains(@class, "full_table")]'):

    csk = pitcher_row.xpath('./td[@data-stat="player"]/@csk')[0]
    print(csk)

网友

2楼 · 编辑于 2024-10-04 05:25:54

尝试下面的脚本来获取它们。您愿意获取的数据包含在注释中，这就是为什么通常的方法不允许您收集这些数据：

from urllib.request import urlopen
from bs4 import BeautifulSoup, Comment

content = urlopen("https://www.baseball-reference.com/leagues/MLB/2018-standard-pitching.shtml")
soup = BeautifulSoup(content.read(),"lxml")
for comment in soup.find_all(string=lambda text:isinstance(text,Comment)):
    sauce = BeautifulSoup(comment,"lxml")
    for tags in sauce.find_all('tr'):
        name = [item.get("csk") for item in tags.find_all("td")[:1]]
        print(name)

相关问题更多 >

编程相关推荐

热门问题

热门文章

有没有什么方法可以刮到csk？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >