如何从1mg webaite中刮取img标签

from bs4 import BeautifulSoup import requests headers = {"User-Agent":'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.116 Safari/537.36'} url='https://www.1mg.com/categories/ayurveda/top-brands-265?filter=true&brand=Dabur' page= requests.get(url, headers=headers) soup=BeautifulSoup(page.content,'html.parser') img=soup.find_all('img',{'class':'style__image___Ny-Sa style__loaded___22epL'}) for i in img: i['src']

<img alt="Dabur Shilajit Gold Capsule" src="https://res.cloudinary.com/du8msdgbj/images/w_150,h_150,c_fit,q_auto,f_auto/v1603435745/feaoalhp4c6bv8icllgp/dabur-shilajit-gold-capsule.jpg" title="Dabur Shilajit Gold Capsule" class="style__image___Ny-Sa style__loaded___22epL">

1条回答

网友

1楼 · 发布于 2024-10-03 11:13:47

页面是动态加载的，因此requests不支持它。但是，该数据在网站上以JSON格式提供，请尝试使用内置的^{}模块获取所有图像（总共40个）

import json
import requests
from bs4 import BeautifulSoup

URL = "https://www.1mg.com/categories/ayurveda/top-brands-265?filter=true&brand=Dabur"

soup = BeautifulSoup(requests.get(URL).content, "html.parser")
json_data = json.loads(
    soup.select_one("#content-container > div > div > div > script:nth-child(5)").string
)

for data in json_data["itemListElement"]:
    print(data["image"])

输出：

https://res.cloudinary.com/du8msdgbj/images/w_150,h_150,c_fit,q_auto,f_auto/v1603435745/feaoalhp4c6bv8icllgp/dabur-shilajit-gold-capsule.jpg
https://res.cloudinary.com/du8msdgbj/images/w_150,h_150,c_fit,q_auto,f_auto/v1500611141/wwkoja9giavml3tgyza4/dabur-musli-pak-laghu.jpg
..All the way until

https://res.cloudinary.com/du8msdgbj/images/w_150,h_150,c_fit,q_auto,f_auto/v1601446598/sujrsvjyzcuvpfwtekhz/anti-oxidants-combo-of-organic-india-tulsi-ginger-turmeric-25-tea-bag-and-dabur-honey-squeezy-225gm-buy-1-get-1-free.png

相关问题更多 >

编程相关推荐

热门问题

热门文章