我有以下问题:我需要在网页抓取期间将图片保存到mongodb。我有一个图像链接。我试过了:
images_binaries = [] # this will store all images data before saving it to mongodb
# save as file on hard disc
urllib.urlretrieve(url, self.album_path + '/' + photo_file_name)
images_binaries.append(open(self.album_path + '/' + photo_file, 'r').read())
....
# after that I append this array of images raw data to Item
post = WaralbumPost()
post['images_binary'] = images_binaries
....
Waralbum项目代码:
^{pr2}$但这会导致在保存到mongo时出错:bson.errors.InvalidStringData: strings in documents must be valid UTF-8: '\xff\.....
有什么更好的方法?转换原始图像数据能解决这个问题吗?也许,斯帕蒂有保存图像的好方法? 谢谢你的回答
解决方案: 我删除了这几行: 图像_二进制文件.append(打开(self.album_路径+'/'+照片\文件,'r').read()) post['images_binary']=images_二进制文件 在我的WaralbumPost中,我还保存了图像url。比,在管道.py我得到这个网址并在mongo中保存图像。代码管道.py公司名称:
class WarAlbum(object):
def __init__(self):
connection = pymongo.Connection(settings['MONGODB_SERVER'], settings['MONGODB_PORT'])
db = connection[settings['MONGODB_DB']]
self.collection = db[settings['MONGODB_COLLECTION']]
self.grid_fs = gridfs.GridFS(getattr(connection, settings['MONGODB_DB']))
def process_item(self, item, spider):
links = item['img_links']
ids = []
for i, link in enumerate(links):
mime_type = mimetypes.guess_type(link)[0]
request = requests.get(link, stream=True)
_id = self.grid_fs.put(request.raw, contentType=mime_type, filename=item['local_images'][i])
ids.append(_id)
item['data_chunk_id'] = ids
self.collection.insert(dict(item))
log.msg("Item wrote to MongoDB database %s/%s" %
(settings['MONGODB_DB'], settings['MONGODB_COLLECTION']),
level=log.DEBUG, spider=spider)
return item
希望,这对某人有帮助
使用GridFS。Example:
相关问题 更多 >
编程相关推荐