用json爬行的python美丽汤

<div class="activities-list horizontal"> <article data-href="http://www.getyourguide.de/london-l57/windsor-bath-und- stonehenge-tagesausflug-ab-london-t977/" id="t977" class="activity-card activity-card-horizontal "> <div class="activity-card-content"> <a class="activity-card-link" href="http://www.getyourguide.de/london-l57/windsor-bath-und-stonehenge-tagesausflug-ab-london-t977/"> <div class="activity-card-image-container"> <img src="http://img.getyourguide.com/img/tour_img-206771-70.jpg" data- role="cover" alt="" /> </div> <div class="activity-card-details"> <header class="activity-card-header"> <h3 class="activity-card-title"> Stonehenge, Windsor und Bath - Tagesausflug ab London </h3> <div class="activity-rating"> <span class="rating" title="Bewertung: 3,9 von 5"> <span class="rating-stars s30"></span> <span class="rating-total">13 Bewertungen</span> </span> </div> </header> <p class="activity-small-description">Verlassen Sie London und entdecken Sie Reize der englischen Landschaft auf einer Ganztagestour, die Sie zu berühmten, historischen Orten führt.…</p> <div class="activity-info activity-duration"> <span class="activity-info-label activity-duration-label">

2条回答

网友

1楼 · 编辑于 2024-07-04 09:10:57

response = urllib2.urlopen(link)
html = response.read()
soup = BeautifulSoup(html,'html.parser',from_encoding='utf-8')

对于deeplinks：

^{pr2}$

对于标题：

titles = soup.find_all('div',{'class':'activity-card-title'})

如果块中只有一个标题，则只使用“查找”

title  = soup.find('div',{'class':'activity-card-title'})

网友

2楼 · 编辑于 2024-07-04 09:10:57

您得到的AttributeError是因为您试图用json加载整个soup，但这是做不到的。看起来您需要<p>标记的内容，然后可以将其加载到json中。你可以这样得到，像普通字典一样得到activities值。在

activities = json.loads(soup.find('p').text)['activities']

但是它变得有点奇怪，因为我们不再处理soup了，我们只是有一个看起来像html的大字符串。所以我们可以用它来做一个新的汤，并从得到的汤中得到深度链接和标题。在

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章