如何使用Beautifulsoup从'<ul>'获取第一个'<li>'的内容

2024-09-29 23:27:12 发布

您现在位置:Python中文网/ 问答频道 /正文

HTML格式如下

<div class="carousel"> 
  <div class="carousel_Wrapper"> 
    <div class="carousel_Container swiper-container"> 
      <ul class="swiper-wrapper">
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0001.jpg"/></figure>
        </li>
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0002.jpg"/></figure>
        </li>
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0003.jpg"/></figure>
        </li>
      </ul>
    </div>
    <div class="carousel_NextBtn"></div> 
    <div class="carousel_PrevBtn"></div> 
  </div> 
</div>

<div class="carousel"> 
  <div class="carousel_Wrapper"> 
    <div class="carousel_Container swiper-container"> 
      <ul class="swiper-wrapper">
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0004.jpg"/></figure>
        </li>
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0005.jpg"/></figure>
        </li>
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0006.jpg"/></figure>
        </li>
      </ul>
    </div>
    <div class="carousel_NextBtn"></div> 
    <div class="carousel_PrevBtn"></div> 
  </div> 
</div>

我想用beauthoulsoup改成下面的HTML。在

^{pr2}$

我想用下面的方法去掉不必要的东西。
因为可能还有其他的,所以我们指定类并执行deponse(),unwrap()。在

html = # First mentioned html

content = BeautifulSoup(html)

content.find('div', class_='carousel_NextBtn').decompose()
content.find('div', class_='carousel').unwrap()
content.find('div', class_='carousel_Wrapper').unwrap()
content.find('div', class_='carousel_Container swiper-container').unwrap()

当应用上述处理时,我认为将生成如下所示的html。在

<ul class="swiper-wrapper">
  <li class="swiper-slide"> 
    <figure><img alt="" src="https://s3.amazonaws.com/0001.jpg"/></figure>
  </li>
  <li class="swiper-slide"> 
    <figure><img alt="" src="https://s3.amazonaws.com/0002.jpg"/></figure>
  </li>
  <li class="swiper-slide"> 
    <figure><img alt="" src="https://s3.amazonaws.com/0003.jpg"/></figure>
  </li>
</ul>
<div class="carousel_PrevBtn"></div> 

<ul class="swiper-wrapper">
  <li class="swiper-slide"> 
    <figure><img alt="" src="https://s3.amazonaws.com/0004.jpg"/></figure>
  </li>
    <li class="swiper-slide"> 
  <figure><img alt="" src="https://s3.amazonaws.com/0005.jpg"/></figure>
  </li>
    <li class="swiper-slide"> 
  <figure><img alt="" src="https://s3.amazonaws.com/0006.jpg"/></figure>
  </li>
</ul>
<div class="carousel_PrevBtn"></div> 

我们认为必要的处理如下。在

  • 1.检索每个<ul>
    的第一个<li>元素的内容
  • 2.插入<p><a href="https://xxxx.jp">other photos</a></p>

    对于2,我认为更换是没有问题的。
    但我不知道如何实现1。

    请告诉我解决问题的方法。在

Tags: httpsdivsrccomimgs3lialt
1条回答
网友
1楼 · 发布于 2024-09-29 23:27:12
html = """<div class="carousel"> 
  <div class="carousel_Wrapper"> 
    <div class="carousel_Container swiper-container"> 
      <ul class="swiper-wrapper">
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0001.jpg"/></figure>
        </li>
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0002.jpg"/></figure>
        </li>
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0003.jpg"/></figure>
        </li>
      </ul>
    </div>
    <div class="carousel_NextBtn"></div> 
    <div class="carousel_PrevBtn"></div> 
  </div> 
</div>

<div class="carousel"> 
  <div class="carousel_Wrapper"> 
    <div class="carousel_Container swiper-container"> 
      <ul class="swiper-wrapper">
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0004.jpg"/></figure>
        </li>
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0005.jpg"/></figure>
        </li>
        <li class="swiper-slide"> 
          <figure><img alt="" src="https://s3.amazonaws.com/0006.jpg"/></figure>
        </li>
      </ul>
    </div>
    <div class="carousel_NextBtn"></div> 
    <div class="carousel_PrevBtn"></div> 
  </div> 
</div>"""

from bs4 import BeautifulSoup
soup = BeautifulSoup(html, 'html.parser')
all_div = soup.find_all('ul', {'class': 'swiper-wrapper'})  # find all ul tag with specified class
for tag in all_div:
    print('           iteration : ' + str(all_div.index(tag)) + '           ')
    print(tag.find('li', {'class': 'swiper-slide'}))  # this method works only if your item has class
    print(tag.contents[1])  # this method will also work if your item don't have a class

您的解决方案“检索每个<ul>的第一个<li>元素的内容”可以实现,如上面的代码所示。你在第二张没有遇到任何问题,所以我还没有贴出来。如果你在这方面需要帮助,请告诉我。在

相关问题 更多 >

    热门问题