标签h4封装在div类中

2条回答

网友

1楼 · 编辑于 2024-10-01 13:35:12

我假设您使用的是BeautifulSoup4.7+。在Beautiful Soup中处理某些属性有点特殊，在4.7中，最终结果与<；=4.6中的结果略有不同。你知道吗

通常作为空格分隔列表处理的属性的处理方式与所有其他属性稍有不同。class恰好是这些属性之一，通常作为空格分隔的列表处理。BeautifulSoup实际上存储这些属性并不像它们在HTML文档中那样，而是将它们存储为类列表（删除的空格）："class1 class2 "->；['class1', 'class2']。当它需要将class属性作为字符串进行求值时，它会将连接每个值的类重新组合为一个空格，但是注意像尾随空格这样的东西已经不存在了："class1 class2"。你知道吗

现在，我不是说这是一个直观的做法，只是说这是美团所做的。我个人更希望BeautifulSoup将它们存储为原始字符串，然后在需要时将它们拆分为一个列表，但这不是它们所做的。你知道吗

现在在BeautifulSoup<；=4.6中，我认为尾部空间得到了保留，但还有一些其他的怪癖。但在您的4.7+版本中，您只需要假设忽略尾随空格和前导空格，并且将空格折叠为类之间的单个空格。所以在你的情况下，只需去掉尾随空格。你知道吗

soup.find('div',{"class": "carResultRow_OfferInfo_Supplier-wrap"})

您可以在这里阅读有关此行为的更多信息：https://bugs.launchpad.net/beautifulsoup/+bug/1824502。你知道吗

示例

from bs4 import BeautifulSoup

html = """
<div class="carResultRow_OfferInfo_Supplier-wrap ">
<h3 class="carResultRow_OfferInfo_SupplierLabel">Servicio proporcionado por:</h3>
<img src="https://cdn2.rcstatic.com/images/suppliers/flat/ocarrol_logo.gif" title="Ocarrol" alt="Ocarrol">
<h4 style="" xpath="1">Ocarrol</h4>
<a href="InfoPo=0&amp;driversAge=30&amp;os=1" onclick="GAQPush('cboxElement">Términos y condiciones</a>
</div>
"""

soup = BeautifulSoup(html, 'html.parser')

print(soup.find('div',{"class": "carResultRow_OfferInfo_Supplier-wrap"}).find('h4'))

输出

<h4 style="" xpath="1">Ocarrol</h4>

网友

2楼 · 编辑于 2024-10-01 13:35:12

也许你可以用CSS选择器代替find？你知道吗

from bs4 import BeautifulSoup

html = '''<div class="carResultRow_OfferInfo_Supplier-wrap ">
<h3 class="carResultRow_OfferInfo_SupplierLabel">Servicio proporcionado por:</h3>
<img src="https://cdn2.rcstatic.com/images/suppliers/flat/ocarrol_logo.gif" title="Ocarrol" alt="Ocarrol">
<h4 style="" xpath="1">Ocarrol</h4>
<a href="InfoPo=0&amp;driversAge=30&amp;os=1" onclick="GAQPush('cboxElement">Términos y condiciones</a>
</div>'''
soup = BeautifulSoup(html, 'lxml')

print(soup.select('div[class="carResultRow_OfferInfo_Supplier-wrap"]'))

印刷品：

[<div class="carResultRow_OfferInfo_Supplier-wrap">
<h3 class="carResultRow_OfferInfo_SupplierLabel">Servicio proporcionado por:</h3>
<img alt="Ocarrol" src="https://cdn2.rcstatic.com/images/suppliers/flat/ocarrol_logo.gif" title="Ocarrol"/>
<h4 style="" xpath="1">Ocarrol</h4>
<a href="InfoPo=0&amp;driversAge=30&amp;os=1" onclick="GAQPush('cboxElement">Términos y condiciones</a>
</div>]

相关问题更多 >

编程相关推荐

热门问题

热门文章