与父di同名的子div

<div class="my_class"> <div>important text</div> <div class="my_class"> <div>not important</div> </div> </div> <div class="my_class"> <div>important text</div> <div class="my_class"> <div>not important</div> </div> </div> ...

my_div_list = soup.find_all('div', attrs={'class': 'my_class'}) for my_div in my_div_list: text_item = my_div.find('div') # to get to the div that contains the important text print(text_item.getText())

3条回答

网友

1楼 · 编辑于 2024-09-30 16:20:44

对于bs4.7.1，您可以使用：has和：first child

from bs4 import BeautifulSoup as bs

html = '''<div class="my_class">
       <div>important text</div>
       <div class="my_class">
            <div>not important</div>
       </div>
   </div>
   <div class="my_class">
       <div>important text</div>
       <div class="my_class">
            <div>not important</div>
       </div>
   </div>'''

soup = bs(html, 'lxml')
print([i.text for i in soup.select('.my_class:has(>.my_class) > div:first-child')])

网友

2楼 · 编辑于 2024-09-30 16:20:44

您可以迭代soup.contents：

from bs4 import BeautifulSoup as soup
r = [i.div.text for i in soup(html, 'html.parser').contents if i != '\n']

输出：

['important text', 'important text']

网友

3楼 · 编辑于 2024-09-30 16:20:44

从findall()文档中：

recursive is a boolean argument (defaulting to True) which tells Beautiful Soup whether to go all the way down the parse tree, or whether to only look at the immediate children of the Tag or the parser object.

因此，假设div的第一级位于标记<head>和<body>下，您可以设置

soup.html.body.find_all('div', attrs={'class': 'my_class'}, 
recursive=False)

输出：你知道吗

 ['important text', 'important text']

相关问题更多 >

编程相关推荐

热门问题

热门文章