在一个网页抓取项目上工作,以建立我的知识(初学者)。这段代码很凌乱,但我现在已经到了可以打印每次评论的评分的程度。如何从列表中的bs4对象(即4.0、5.0)中提取评级,然后对其进行平均
Output:
[<meta content="4.0" itemprop="ratingValue"/>, <meta content="5.0" itemprop="ratingValue"/>, ... ]
import mechanize
from bs4 import BeautifulSoup
def searchYelp():
br = mechanize.Browser()
br.set_handle_robots(False)
br.addheaders = [('User-agent', 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.1) Gecko/2008071615 Fedora/3.0.1-1.fc9 Firefox/3.0.1')]
response = br.open('https://www.yelp.com')
br.select_form(nr=0)
br.form['find_desc'] = 'Del Taco'
br.form['find_loc'] = 'New York City'
br.submit()
link_list = []
for link in br.links():
if link.url.startswith('/biz/'):
link_list.append(link.url)
break
big_list_of_ratings = []
yelpPage = br.open(link_list[0])
soup = BeautifulSoup(yelpPage.read(), 'html.parser')
for review in soup.find_all('meta'):
if review.get('itemprop') == 'ratingValue':
big_list_of_ratings.append(review)
print(big_list_of_ratings)
searchYelp()
而不是这个
添加如下属性
review['content']
或者我建议使用css选择器
相关问题 更多 >
编程相关推荐