漂亮的汤。所有（）接受单词开头 - 问答 - Python中文网

漂亮的汤。所有（）接受单词开头

2024-07-07 07:19:12 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我正在网站上浏览一个名为“靓汤”的网站，其类名如下：

<a class="Component-headline-0-2-109" data-key="card-headline" href="/article/politics-senate-elections-legislation-coronavirus-pandemic-bills-f100b3a3b4498a75d6ce522dc09056b0">

主要问题是类名总是以Component-headline-开头，而只是以随机数发送。当我使用beautiful soup的soup.find_all('class','Component-headline')时，由于唯一的数字，它无法抓取任何东西。是否可以使用find_all，但只获取以“Component headline”开头的所有类

我也在考虑使用data-key="card-headline"和soup.find_all('data-key','card-headline')，但由于某种原因，这也不起作用，所以我假设我无法通过数据键找到，但不确定。有什么建议吗

Tags： key data 网站 article all find card class

2条回答

网友

1楼 · 编辑于 2024-07-07 07:19:12

BeautifulSoup支持正则表达式，因此您可以使用re.compile在class属性上搜索部分文本

import re 
soup.find_all('a', class_=re.compile('Component-headline'))

您也可以使用lambda

soup.find_all('a', class_=lambda c: c.startswith('Component-headline'))

网友

2楼 · 编辑于 2024-07-07 07:19:12

尝试使用^{}CSS选择器

要使用CSS选择器，请使用^{}，而不是find_all()方法

下面选择以Component-headline开头的所有类：

soup = BeautifulSoup(html, "html.parser")

print(soup.select('[class^="Component-headline"]'))

相关问题更多 >

编程相关推荐

热门问题

热门文章