在python中只获取h1文本no span文本

2024-07-03 06:32:51 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个这样的代码,我试着在h1中获取数据。这里是'thewire'。但是我得到了h1中的所有文本。在

<h1 id="aiv-content-title" class="js-hide-on-play"> 
The Wire
    <span class="num-of-seasons">5 Seasons</span>
    <span class="release-year">2002</span>
</h1>

我得到的输出是Wire5 Seasons2002

^{pr2}$

当我尝试过这个代码时,我得到了这样的结果

The Wire5 Seasons2002 5 Seasons 2002

我在期待这样的事情

The Wire 5 Seasons 2002


Tags: the代码文本idtitlecontenth1class
2条回答

您可以执行以下操作:

h1_element = elm.find('h1',{id:'aiv-content-title'})
num_seasons = h1_element.find('span',{'class':'num-of-seasons'}).getText().strip()
release_year = h1_element.find('span',{'class':'release-year'}).getText().strip()

while h1_element.find('span'):
   h1_element.find('span').extract() 
   # This will remove the span elements in the h1 element

print h1_element.getText().strip()
print num_seasons
print release_year

我已经找到了解决办法,但有点棘手

这是密码,希望这能帮助一些新生进入这个领域

elm=soup.find('div', id="dv-dp-main-content")
heading=elm.find('h1',id='aiv-content-title')
heading=heading.text
seasons=elm.find('span',{'class':'num-of-seasons'})
if seasons=='None':
    no_seasons='1 Season'
elif seasons!='None':
    no_seasons=seasons.text


release_year=elm.find('span',{'class':'release-year'})
releaseyr=release_year.text


rmstr=heading.replace(releaseyr," ")
name=rmstr.replace(no_seasons," ")
print name
print no_seasons
print releaseyr

相关问题 更多 >