如何使用BeautifulSoup和Python从类似元素中仅提取特定文本

driver.get('http://www.animenewsnetwork.com/encyclopedia/anime.php?id=160') elem = driver.find_element_by_xpath("//*") source_codeANN = elem.get_attribute("outerHTML") soup2 = BeautifulSoup(source_codeANN, 'html.parser') Genre = soup2.find_all('div',{'id':'infotype-30'}) print Genre

3条回答

网友

1楼 · 编辑于 2024-09-30 05:33:44

如果您有以下HTML

<div id="infotype-30" class="encyc-info-type br same-width-as-main" style="width: auto;">
    <strong>Genres:</strong> 
    <span><a href="/encyclopedia/search/genreresults?w=series&amp;a=AA&amp;a=OC&amp;a=TA&amp;a=MA&amp;g=adventure/A&amp;o=rating" class="discreet">adventure</a></span>,
    <span><a href="/encyclopedia/search/genreresults?w=series&amp;a=AA&amp;a=OC&amp;a=TA&amp;a=MA&amp;g=comedy&amp;o=rating" class="discreet">comedy</a></span>,
    <span><a href="/encyclopedia/search/genreresults?w=series&amp;a=AA&amp;a=OC&amp;a=TA&amp;a=MA&amp;g=science%20fiction&amp;o=rating" class="discreet">science fiction</a></span>
</div>

您可以获得以下类型链接的值：

from bs4 import BeautifulSoup
from selenium import webdriver

driver = webdriver.PhantomJS()
driver.get('http://www.animenewsnetwork.com/encyclopedia/anime.php?id=160')
elem = driver.find_element_by_xpath("//*")
source_codeANN = elem.get_attribute("outerHTML")
soup2 = BeautifulSoup(source_codeANN, 'html.parser')
genre_div = soup2.find('div', id='infotype-30')
genres = [ a.text for a in genre_div.find_all('a') ]
print genres
# [u'adventure', u'comedy', u'science fiction']

网友

2楼 · 编辑于 2024-09-30 05:33:44

我建议用Genres:文本和join查找strong元素的以下所有同级：

", ".join(elm.text for elm in driver.find_elements_by_xpath("//strong[. = 'Genres:']/following-sibling::*"))

演示：

>>> from selenium import webdriver
>>> driver = webdriver.PhantomJS()
>>> driver.get("http://www.animenewsnetwork.com/encyclopedia/anime.php?id=160")  
>>> ", ".join(elm.text for elm in driver.find_elements_by_xpath("//strong[. = 'Genres:']/following-sibling::*"))
u'adventure, comedy, science fiction'

网友

3楼 · 编辑于 2024-09-30 05:33:44

请试试这个

driver.get("http://www.animenewsnetwork.com/encyclopedia/anime.php?id=160");
elem = driver.find_element_by_id("infotype-30")
print elem.text

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何使用BeautifulSoup和Python从类似元素中仅提取特定文本

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >