将网站上使用regex找到的所有图像文件下载到python中我的计算机中的指定目录

import urllib,re,os _in = raw_input('< Press enter to download images from first page >') if not os.path.exists('FailImages'): # Directory that I want to save the image to os.mkdir('FailImages') # If no directory create it source = urllib.urlopen('http://www.samplewebpage.com/index.html').read() imgs = re.findall('\w+.jpg',source) # regex finds files with .jpg extension

1条回答

网友
1楼 · 发布于 2024-10-01 13:26:12

这应该能让你走了。它不处理是否是外部链接，但它会抓取本地图像
可选
来自安装依赖项的请求 http://requests.readthedocs.org/en/latest/
从命令行执行：
$ sudo easy_install requests
如果使用请求，取消对3f.____行的注释和最后一行的注释：
import urllib2,re,os #import requests folder = "FailImages" if not os.path.exists(folder): # Directory that I want to save the image to os.mkdir(folder) # If no directory create it url = "http://www.google.ca" source = urllib2.urlopen(url).read() imgs = re.findall(r'(https?:/)?(/?[\w_\-&%?./]*?)\.(jpg|png|gif)',source, re.M) # regex finds files with .jpg extension for img in imgs: remote = url + img[1] + "." + img[2]; filename = folder + "/" + img[1].split('/')[-1] + "." + img[2] print "Copying from " + remote + " to " + filename if not os.path.exists(filename): f = open(filename, 'wb') f.write(urllib2.urlopen(remote).read()) #f.write(requests.get(remote).content) f.close()
注意：Requests效果更好，可以确保发送正确的头，urllib在大多数情况下可能不起作用。在

相关问题更多 >

编程相关推荐

热门问题

热门文章