美丽的苏普怎么找到的

>>> htmls="<html><body><p class=\"pagination-container\">slytherin</p><p class=\"pagination-container and something\">gryffindor</p></body></html>" >>> soup=BeautifulSoup(htmls, "html.parser") >>> for i in soup.findAll("p",{"class":"pagination-container"}): print(i.text) slytherin gryffindor >>> for i in soup.findAll("p", {"class":"pag"}): print(i.text) >>> for i in soup.findAll("p",{"class":"pagination-container"}): print(i.text) slytherin gryffindor >>> for i in soup.findAll("p",{"class":"pagination"}): print(i.text) >>> len(soup.findAll("p",{"class":"pagination-container"})) 2 >>> len(soup.findAll("p",{"class":"pagination-containe"})) 0 >>> len(soup.findAll("p",{"class":"pagination-contai"})) 0 >>> len(soup.findAll("p",{"class":"pagination-container and something"})) 1 >>> len(soup.findAll("p",{"class":"pagination-conta"})) 0

1条回答

网友

1楼 · 发布于 2024-10-05 11:41:15

首先，^{}是一个特殊的multi-valued space-delimited attribute，并且具有特殊的处理方式。在

当您编写soup.findAll("p", {"class":"pag"})时，BeautifulSoup将搜索具有pag类的元素。它将按空格分隔元素类值，并检查拆分的项中是否有pag。如果有一个带有class="test pag"或class="pag"的元素，那么它将是匹配的。在

注意，在soup.findAll("p", {"class": "pagination-container and something"})的情况下，BeautifulSoup将匹配具有确切的class属性值的元素。在这个例子中没有涉及到拆分——它只看到一个元素，其中完整的class值等于所需的字符串。在

要使某个类的部分匹配，可以提供一个regular expression或a function作为类筛选器值：

import re

soup.find_all("p", {"class": re.compile(r"pag")})  # contains pag
soup.find_all("p", {"class": re.compile(r"^pag")})  # starts with pag

soup.find_all("p", {"class": lambda class_: class_ and "pag" in class_})  # contains pag
soup.find_all("p", {"class": lambda class_: class_ and class_.startswith("pag")})  # starts with pag

还有很多要说的，但是您还应该知道BeautifulSoup支持CSS selector（这是一个有限的支持，但涵盖了大多数常见的用例）。你可以这样写：

^{pr2}$
处理BeautifulSoup中的class属性值是常见的混淆和问题，请参阅以下相关主题以获得更多理解：
BeautifulSoup returns empty list when searching by compound class names
Finding multiple attributes within the span tag in Python

相关问题更多 >

编程相关推荐

热门问题

热门文章