我有下面的代码,我想去一个网页,把所有相关的漫画从网站上拉出来,并存储在我的电脑上。第一张图片下载很好,但似乎有一个问题,循环到网页上的前一页。如果有人能看看代码和帮助,将不胜感激。 我得到的错误是:
'Traceback (most recent call last):
File "C:\Users\528000\Desktop\kids print\Comic-gather.py", line 41, in <module
>
prevLink = soup.select('a[class="prevLink"]')[0]
'IndexError: list index out of range
'import requests, os, bs4
url = 'http://darklegacycomics.com'
os.makedirs('darklegacy', exist_ok=True)
while not url.endswith('#'):
# Download the page.
print('Downloading page %s...' % url)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)
comicElem = soup.select('.comic img')
if comicElem == []:
print('Could not find comic image.')
else:
try:
comicUrl ='http://darklegacycomics.com' + comicElem[0].get('src')
# Download the image.
print('Downloading image %s...' % (comicUrl))
res = requests.get(comicUrl)
res.raise_for_status()
except requests.exceptions.MissingSchema:
# skip this comic
prevLink = soup.select('.prevlink')[0]
url = 'http://darklegacycomics.com' + prevLink.get('href')
continue
# Save the image to ./darklegacy.
imageFile = open(os.path.join('darklegacy', os.path.basename(comicUrl)), 'wb')
for chunk in res.iter_content(100000):
imageFile.write(chunk)
imageFile.close()
# Get the Prev button's url.
prevLink = soup.select('a[class="prevLink"]')[0]
url = 'http://darklegacycomics.com' + prevLink.get('href')''
这将获得所有图像:
相关问题 更多 >
编程相关推荐