如何使用Python和Beautiful Soup获取HTML中标签和其结束之间的内容?问题的回答

如何使用Python和Beautiful Soup获取HTML中标签和其结束之间的内容?

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如果元素只包含文本，请使用<a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/#string" rel="nofollow">^{<cd1>} attribute</a>： <pre><code>headline = soup.find(class_='cd__headline-text') print(headline.string) </code></pre> 如果包含其他标记，则可以获取当前元素中包含的所有文本，或者只获取当前元素中的特定文本。在 <a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/#get-text" rel="nofollow">^{<cd2>} function</a>将递归并收集元素和子元素中的所有字符串，将它们与您选择的字符串连接起来（默认为空字符串），并使用或不使用空格剥离。在 要只获取特定的字符串，可以遍历<a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/#strings-and-stripped-strings" rel="nofollow">^{<cd3>} or ^{<cd4>} generators</a>，或者使用<a href="http://www.crummy.com/software/BeautifulSoup/bs4/doc/#contents-and-children" rel="nofollow">element contents</a>访问所有包含的元素，然后选择<code>NavigableString</code>类型的实例。在 示例演示： ^{pr2}$ 并添加了一个附加元素： <pre><code>>>> markup = 'Is this model too thin for Yves Saint Laurent? ' >>> soup = BeautifulSoup(markup) >>> headline = soup.find(class_='cd__headline-text') >>> headline.string is None True >>> print list(headline.strings) [u'Is this model ', u'too thin', u' for Yves Saint Laurent? '] >>> print list(headline.stripped_strings) [u'Is this model', u'too thin', u'for Yves Saint Laurent?'] >>> print headline.get_text() Is this model too thin for Yves Saint Laurent? >>> print headline.get_text(' - ', strip=True) Is this model - too thin - for Yves Saint Laurent? >>> headline.contents [u'Is this model ', too thin, u' for Yves Saint Laurent? '] >>> from bs4 import NavigableString >>> [el for el in headline.children if isinstance(el, NavigableString)] [u'Is this model ', u' for Yves Saint Laurent? '] </code></pre>

如何使用Python和Beautiful Soup获取HTML中标签和其结束之间的内容?

1 个回答

相关Python问题