仅使用beautiful soup从google中刮取src属性

网友

1楼 · 编辑于 2024-09-27 23:18:43

这是Base64编码的图像。您可以将其保存到图像文件，如：

src = "BASE64 DATA"
img = open("MyImage.gif","wb+")
img.write(src.decode('base64'))
img.close()

网友

2楼 · 编辑于 2024-09-27 23:18:43

这是数据URL，请参阅https://developer.mozilla.org/en-US/docs/Web/HTTP/Basics_of_HTTP/Data_URIs

您可以解码base64字符串，然后保存到图像文件中

网友

3楼 · 编辑于 2024-09-27 23:18:43

谷歌图像是从（谢天谢地）内联JavaScript插入DOM的。为任何查询打开搜索结果的页面源，复制imagesrc属性，然后在页面源中找到它

要仅使用bs4提取数据，可以模拟浏览器并使用正则表达式从内联JavaScript提取数据

或者，您可以使用SerpApi提取完整图像的URI。这是一款免费试用的付费SaaS

使用curl的示例

curl -s 'https://serpapi.com/search?q=coffee&tbm=isch'

使用Repl.it上的google-search-resultsPython包的示例

from serpapi import GoogleSearch
import os

params = {
    "engine": "google",
    "q": "coffee",
    "tbm": "isch",
    "api_key": os.getenv("API_KEY")
}

client = GoogleSearch(params)
data = client.get_dict()

print("Images results")

for result in data['images_results']:
    print(f"""
Position: {result['position']}
Original image: {result['original']}
""")

示例输出

Images results

Position: 1
Original image: https://upload.wikimedia.org/wikipedia/commons/4/45/A_small_cup_of_coffee.JPG


Position: 2
Original image: https://media3.s-nbcnews.com/j/newscms/2019_33/2203981/171026-better-coffee-boost-se-329p_67dfb6820f7d3898b5486975903c2e51.fit-1240w.jpg

检查文档中的Google Images API on SerpApi website

免责声明：我在SerpApi工作

相关问题更多 >

编程相关推荐

热门问题

热门文章

仅使用beautiful soup从google中刮取src属性

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >