如何使用BeautifulSoup仅解析引号？

<div class="result"> <p><strong>Date:</strong> February 2, 2019</p> <p>"My mind had no choice but to drift into an elaborate fantasy realm."</p> <blockquote> <p class="attribution">— Pamela, Paul</p> </blockquote> <a href="/metaphors/25249" class="load_details">preview</a> | <a href="/metaphors/25249" title="Let Children Get Bored Again [from The New York Times]">full record</a> <div class="details_container"></div> </div> <div class="result"> <p><strong>Date:</strong> February 2, 2019</p> <p>"You let your mind wander and follow it where it goes."</p> <blockquote> <p class="attribution">— Pamela, Paul</p> </blockquote> <a href="/metaphors/25250" class="load_details">preview</a> | <a href="/metaphors/25250" title="Let Children Get Bored Again [from The New York Times]">full record</a> <div class="details_container"></div> </div>

import bs4 as bs import urllib.request sauce = urllib.request.urlopen('URLHERE').read() soup = bs.BeautifulSoup(sauce,'lxml') body = soup.body for paragraph in body.find_all('p'): print(paragraph.text)

2条回答

网友

1楼 · 编辑于 2024-09-26 17:43:30

如果我正确理解了您的问题，您希望只打印引号，这些引号出现在第三段的每个元素中，从第二段开始

quotes = soup.find_all('p')

for i in range(1, len(quotes), 3):
   print(quotes[i].text)

也许有一种更干净的方法可以做到这一点，但这应该是可行的

网友

2楼 · 编辑于 2024-09-26 17:43:30

您可以使用xpath进行查询，例如：

import requests

from lxml import html

page = requests.get('enter_your_url')
tree = html.fromstring(page.content)
data = tree.xpath('//div[@class="result"]//p[2]/text()')

print(data)

相关问题更多 >

编程相关推荐

热门问题

热门文章