beautifulGroup标签下缺少标签

2024-09-27 21:30:23 发布

男 | 程序猿一只，喜欢编程写python代码。

所以，我想从“h1”标签中得到一个文本。我使用的是BeutifulSoup，它可以正常工作，直到“article”标记中没有“h1”标记，然后我得到“'NoneType'对象没有属性'contents'错误。代码如下：

from bs4 import BeautifulSoup

page = 

    "<article>
    <a href="http://something">
    </a>   (missing "h1")
    <a href="http://something">
    </a>
    </article>
    <article>
    <a href="http://something">
    </a>
    <a href="http://something">
       <h1>something</h1>
    </a>
    </article>
    <article>
    <a href="http://something">
    </a>
    <a href="http://something">
       <h1>something</h1>
   </a>
   </article>"

soup = BeautifulSoup(page, "lxml")

h1s = []

articles = soup.find_all("article")


for i in range(1,len(articles)):
    h1s.append(articles[i].h1.contents)

这些是当我检查有h1标记和没有h1标记的行时的消息。在

^{pr2}$

Tags：标记文本 http contents article page 标签 h1

1条回答

网友

1楼 · 发布于 2024-09-27 21:30:23

您应该只需遍历articles，这是一个列表，然后使用find_all()方法获取a标记内的所有h1，然后将其text添加到h1s中

h1s = []
articles = soup.find_all("article")
for i in articles:
    for x in i.find_all('h1'):
            h1s.append(x.text)

beautifulGroup标签下缺少标签

相关问题更多 >

编程相关推荐

热门问题

热门文章

beautifulGroup标签下缺少标签

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >