网页抓取图像：找不到“rel”选择器

#! python3 #swordscraper.py - Downloads all the swords comics. import requests, os, bs4 os.chdir(r'C:\Users\bromp\OneDrive\Desktop\Python') os.makedirs('swords', exist_ok=True) #store comics in /swords url = 'https://swordscomic.com/' #starting url while not url.endswith('#'): #Download the page. print('Downloading page %s...' % url) res = requests.get(url) res.raise_for_status soup = bs4.BeautifulSoup(res.text, 'html.parser') #Find the URL of the comic image. comicElem = soup.select('#comic-image') if comicElem == []: print('Could not find comic image.') else: comicUrl = comicElem[0].get('src') comicUrl = "http://" + comicUrl if 'swords' not in comicUrl: comicUrl=comicUrl[:7]+'swordscomic.com/'+comicUrl[7:] #Download the image. print('Downloading image %s...' % (comicUrl)) res = requests.get(comicUrl) res.raise_for_status() #Save the image to ./swords imageFile = open(os.path.join('swords', os.path.basename(comicUrl)), 'wb') for chunk in res.iter_content(100000): imageFile.write(chunk) imageFile.close() #Get the Prev button's url. prevLink = soup.select('a[id=navigation-previous]')[0] url = 'https://swordscomic.com/' + prevLink.get('href') print('Done')

Downloading page https://swordscomic.com/... Downloading image http://swordscomic.com//media/Swords363bt.png... Downloading page https://swordscomic.com//comic/CCCLXII/... Could not find comic image. Traceback (most recent call last): File "C:\...\", line 39, in <module> prevLink = soup.select('a[id=navigation-previous]')[0] IndexError: list index out of range

1条回答

网友

1楼 · 发布于 2024-05-02 17:15:08

页面是用JavaScript呈现的。特别是您提取的链接：

<a href="/comic/CCCLXII/" id="navigation-previous" class="navigation-button navigation-previous" onclick="COMICS.previousButtonPressed(); return false;"></a>

有一个onclick（）事件，该事件可能链接到下一页。此外，该页面使用XHR。因此，您唯一的选择是使用呈现JavaScript的技术，因此请尝试使用Selenium或请求htmlhttps://github.com/psf/requests-html

相关问题更多 >

编程相关推荐

热门问题

热门文章