使用BeatifulSoup class==解析HTML，且标题包含

<div class="product-details"> <h4 class="title" ><a href="https://productwebpage.com/blue-standard" title="Blue - Standard">Blue - Standard</a></h4> <a class="learn-more" data-test-selector="linkViewMoreDetails" href="https://productwebpage.com">Learn More</a> <div class="tocart" <a class="" href="/store/addtocartplp?productId=3593" id="AddToCartSimple-3593">Add To Cart</a></div> </div> , <div class="product-details"> <h4 class="title" ><a href="https://productwebpage.com/blue-wide" title="Blue - Wide">Blue - Wide</a></h4> <a class="learn-more" data-test-selector="linkViewMoreDetails" href="https://productwebpage.com">Learn More</a> <div class="tocart" <a class="disAddtoCardBtn" href="javascript:void(0)" id="AddToCartSimple-3576" >SOLD</a></div> </div>

1条回答

网友

1楼 · 发布于 2024-10-01 00:29:56

我认为你的html被破坏了。您可以使用:has、:not和:contains（:-soup-contains-latest soupsive）以及属性=值选择器使用css选择器进行整个筛选。^是一个以开头的运算符，表示属性值以=。~是一般同级组合子，而>；是一个儿童组合器。这意味着寻找一个类为（.）tocart的同级，然后是一个id以AddToCartSimple-开头但没有显示包含SOLD的文本的子级。不如!="SOLD"具体，因为它可以是部分字符串排除。取决于实际数据中观察到的变化

from bs4 import BeautifulSoup as bs

html ='''
  <div class="product-details"> 
   <h4 class="title"><a href="https://productwebpage.com/blue-standard" title="Blue - Standard">Blue - Standard</a></h4> <a class="learn-more" data-test-selector="linkViewMoreDetails" href="https://productwebpage.com">Learn More</a> 
   <div class="tocart"> <a class="" href="/store/addtocartplp?productId=3593" id="AddToCartSimple-3593">Add To Cart</a> 
   </div> 
   <div class="product-details"> 
    <h4 class="title"><a href="https://productwebpage.com/blue-wide" title="Blue - Wide">Blue - Wide</a></h4> <a class="learn-more" data-test-selector="linkViewMoreDetails" href="https://productwebpage.com">Learn More</a> 
    <div class="tocart"> <a class="disAddtoCardBtn" href="javascript:void(0)" id="AddToCartSimple-3576">SOLD</a> 
    </div> 
'''
soup = bs(html, 'html.parser')
print(soup.select_one('.title:has([title^="Blue -"]) ~ .tocart > [id^=AddToCartSimple-]:not(:contains("SOLD"))')['id'])

当然，在使用['id']访问之前，您应该检查是否存在匹配项。您还可以按如下方式进行所有匹配：

[i['id'] for i in soup.select('.title:has([title^="Blue -"]) ~ .tocart > [id^=AddToCartSimple-]:not(:contains("SOLD"))')]

相关问题更多 >

编程相关推荐

热门问题

热门文章