迭代时出现IndexError
问题。该程序运行良好,直到一切都完成,没有更多的“子网站”去,然后它崩溃,正因为如此,它是不可能保存在.txt
newUrl = nextpage[counter]['href']
IndexError: list index out of range
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import json
class Olx():
def __init__(self, url):
self.url = url
def getPrice(self):
"""Get prices from olx"""
html = urlopen(self.url)
bs = BeautifulSoup(html, 'html.parser')
price = bs.findAll('p', class_='price')
return price
def nextPage(self):
"""Go to the next page"""
html = urlopen(self.url)
bs = BeautifulSoup(html, 'html.parser')
pageButton = bs.findAll('a', {'class': 'block br3 brc8 large tdnone lheight24'})
try:
return pageButton
except AttributeError:
None
else:
return pageButton
olxprices = Olx('https://www.olx.pl/nieruchomosci/mieszkania/wynajem/olsztyn/').getPrice()
nextpage = Olx('https://www.olx.pl/nieruchomosci/mieszkania/wynajem/olsztyn/').nextPage()
counter = 0
output = []
while len(nextpage) > 0:
for price in olxprices:
output.append(price.get_text().strip())
print(price.get_text().strip())
newUrl = nextpage[counter]['href']
olxprices = Olx(newUrl).getPrice()
counter += 1
print(output)
您可以尝试使用异常
(或者做任何你想做的事情作为例外) 如果这不能回答您的问题,可能是因为页面的长度保持不变,所以您可能也希望遍历它
len(nextpage)
永远不会改变,因此while循环永远不会结束,并且最终counter
索引会超过nextpage
的结尾。相反,请执行以下操作:相关问题 更多 >
编程相关推荐