无法从网站BeautifulSoup4中刮取特定内容

for rest in dining_page_soup.select("div.copy_left p strong"): if rest.next_sibling is not None: if rest.next_sibling.next_sibling is not None: title = rest.text desc = rest.next_sibling.next_sibling print ("Title: "+title) print (desc)

2条回答

网友

1楼 · 编辑于 2024-07-05 11:41:01

如果您不介意使用xpath，这应该可以

import requests
from lxml import html

url = "http://www.radisson.com/lansing-hotel-mi-48933/lansing/hotel/dining"
page = requests.get(url).text
tree = html.fromstring(page)

xp_t = "//*[@class='copy_left']/descendant-or-self::node()/strong[not(following-sibling::a)]/text()"
xp_d = "//*[@class='copy_left']/descendant-or-self::node()/strong[not(following-sibling::a)]/../text()[not(following-sibling::strong)]"

titles = tree.xpath(xp_t)
descriptions = tree.xpath(xp_d)  # still contains garbage like '\r\n'
descriptions = [d.strip() for d in descriptions if d.strip()]

for t, d in zip(titles, descriptions):
    print("{title}: {description}".format(title=t, description=d))

这里的描述包含3个元素：“这个市中心…”，“为了一个杯子…”，“如果你喜欢…”。你知道吗

如果您还需要“When you are the mood…”，请替换为：

xp_d = "//*[@class='copy_left']/descendant-or-self::node()/strong[not(following-sibling::a)]/../text()"

网友

2楼 · 编辑于 2024-07-05 11:41:01

这是一个非常简单的解决方案

from bs4 import BeautifulSoup
import requests

r  = requests.get("http://www.radisson.com/lansing-hotel-mi-48933/lansing/hotel/dining")
data = r.text
soup = BeautifulSoup(data)
for found_text in soup.select('div.copy_left'):
    print found_text.text

更新

根据问题的一个改进，这里是一个使用RE的解决方案。必须为第一段“当你……”制定具体的解决办法，因为它不尊重其他段落的结构。你知道吗

for tag in soup.find_all(re.compile("^strong")):

    title = tag.text
    desc = tag.next_sibling.next_sibling
    print ("Title:  "+title)
    print (desc)

输出

Title: Capitol City Grille
This downtown Lansing restaurant offers delicious, contemporary American cuisine in an upscale yet relaxed environment. You can enjoy dishes that range from fluffy pancakes to juicy filet mignon steaks. Breakfast and lunch buffets are available, as well as an à la carte menu.
Title: Capitol City Grille Lounge
For a glass of wine or a hand-crafted cocktail and great conversation, spend an afternoon or evening at Capitol City Grille Lounge with friends or colleagues.
Title: Room Service
If you prefer to dine in the comfort of your own room, order from the room service menu.
Title: Menus
Breakfast Menu
Title: Capitol City Grille Hours
Breakfast, 6:30-11 a.m.
Title: Capitol City Grille Lounge Hours
Mon-Thu, 11 a.m.-11 p.m.
Title: Room Service Hours
Daily, 6:30 a.m.-2 p.m. and 5-10 p.m.

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法从网站BeautifulSoup4中刮取特定内容

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >