我用python编写了一段代码,以便从trip advisor(来自评论的评分)中获取一些数据。问题是,每当我运行代码时,它都会给我不同的行,而且从不删除所有的网页。你知道吗
出现的索引错误是:
Traceback (most recent call last):
File "C:/Users/thimios/PycharmProjects/TripadvisorScrapping/proxiro.py", line 26, in <module>
rating = soup.findAll("div", {'class': 'rating reviewItemInline'})[i]
IndexError: list index out of range
代码如下:
from bs4 import BeautifulSoup
import os
import urllib.request
file2 = open(os.path.expanduser(r"~/Desktop/TripAdviser Reviews2.csv"), "wb")
file2.write(b"Organization,Rating" + b"\n")
WebSites = [
"https://www.tripadvisor.com/Hotel_Review-g189400-d198932-Reviews-Hilton_Athens-Athens_Attica.html#REVIEWS"]
Checker ="REVIEWS"
# looping through each site until it hits a break
for theurl in WebSites:
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, "html.parser")
#print(soup)
while True:
# Extract ratings from the text reviews
altarray = ""
for i in range(0,10):
rating = soup.findAll("div", {'class': 'rating reviewItemInline'})[i]
rating1 = rating.find_all("span")[0]
rating2 = rating1['class'][1][-2:]
print(rating2)
if len(altarray) == 0:
altarray = [rating2]
else:
altarray.append(rating2)
#print(altarray)
#print(len(altarray))
#print(type(altarray))
# Extract Organization,
Organization1 = soup.find(attrs={'class': 'heading_name'})
Organization = Organization1.text.replace('"', ' ').replace('Review of',' ').strip()
#print(Organization)
# Loop through each review on the page
for x in range(0, 10):
Rating = altarray[x]
Rating = str(Rating)
#print(Rating)
#print(type(Rating))
Record2 = Organization + "," + Rating
if Checker == "REVIEWS":
file2.write(bytes(Record2, encoding="ascii", errors='ignore') + b"\n")
link = soup.find_all(attrs={"class": "nav next rndBtn ui_button primary taLnk"})
#print(link)
#print(link[0])
if len(link) == 0:
break
else:
soup = BeautifulSoup(urllib.request.urlopen("http://www.tripadvisor.com" + link[0].get('href')),"html.parser")
#print(soup)
#print(Organization)
print(link[0].get('href'))
Checker = link[0].get('href')[-7:]
#print(Checker)
file2.close()
我想旅行顾问并没有完全访问有数据吗主意?你知道吗
尝试按索引访问列表中的元素时遇到错误,该索引不存在。你知道吗
我已经运行了你的代码并打印了:
尽管如此,循环的方式并不是最具python风格的方式,而且也容易受到很多索引错误的影响。你知道吗
你能做的就是替换这个:
使用:
这也将解决错误。你知道吗
相关问题 更多 >
编程相关推荐