关于使用beautifulsoup解析html的问题

soup = BeautifulSoup(driver.page_source, 'html.parser') containers = soup.findAll("div", {"class": "listA"}) datas = [] for data in containers: textspan = data.find("span") datas.append(textspan.text)

2条回答

网友

1楼 · 编辑于 2024-09-30 01:29:04

另一个解决方案涉及simplifieddoc，它不依赖第三方库，而且更轻、更快，非常适合初学者。这里有更多的例子here

from simplified_scrapy.simplified_doc import SimplifiedDoc
html ='''
<span><span>Text 1</span><b>Text 2</b><b>Text 3</b></span>
'''
doc = SimplifiedDoc(html)
span = doc.span # Get the outermost span
first = span.span # Get the first span in span
print (first.text)
second = span.b
print (second.text)
third = second.next
print (third.text)

结果:

Text 1
Text 2
Text 3

网友

2楼 · 编辑于 2024-09-30 01:29:04

如果您只想文本1使用此代码

import bs4

content = "<span><span>Text 1</span><b>Text 2</b><b>Text 3</b></span>"
soup = bs4.BeautifulSoup(content, 'html.parser')


# soup('span') will give you
# [<span><span>Text 1</span><b>Text 2</b><b>Text 3</b></span>, <span>Text 1</span>]

span_text = soup('span')

for e in span_text:
    if not e('span'):
        print(e.text)

输出：

Text 1

相关问题更多 >

编程相关推荐

热门问题

热门文章

关于使用beautifulsoup解析html的问题

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >