如何打开div类中的所有href?

2024-09-28 21:16:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我对python和所有东西都是新手,我希望在一个div类中解析所有href。我的目标是创建一个程序来打开div类中的所有链接,以便能够保存与href关联的照片

链接:https://www.opi.com/shop-products/nail-polish-powders/nail-lacquer

我要分析的部分是“div id:all\u nail\u漆器”

到目前为止,我能够获得所有的href,这就是我目前所拥有的:

import urllib
import urllib.request
from bs4 import BeautifulSoup

theurl = "https://www.opi.com/shop-products/nail-polish-powders/nail-lacquer"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, "html.parser")

print(soup.title.text)

nail_lacquer = (soup.find('div', {"id":"all_nail_lacquer"}))

"""
for nail_lacquer in soup.find_all('div'):
    print(nail_lacquer.findAll('a')
"""

for a in soup.findAll('div', {"id":"all_nail_lacquer"}):
    for b in a.findAll('a'):
        print(b.get('href'))

Tags: inhttpsimportdividfor链接all
1条回答
网友
1楼 · 发布于 2024-09-28 21:16:27

要打印图像链接(甚至高分辨率图像)和标题,可以使用以下脚本:

import urllib
import urllib.request
from bs4 import BeautifulSoup

theurl = "https://www.opi.com/shop-products/nail-polish-powders/nail-lacquer"
thepage = urllib.request.urlopen(theurl)
soup = BeautifulSoup(thepage, "html.parser")

for img in soup.select('#all_nail_lacquer [typeof="foaf:Image"][data-src]'):
    print(img['data-src'])
    print(img['data-src'].replace('shelf_image', 'photos')) # <  this is URL to hi-res image
    print(img['title'])
    print('-' * 80)

印刷品:

https://www.opi.com/sites/default/files/styles/product_shelf_image/public/baby-take-a-vow-nlsh1-nail-lacquer-22850011001_0_0.jpg?itok=3b2ftHzc
https://www.opi.com/sites/default/files/styles/product_photos/public/baby-take-a-vow-nlsh1-nail-lacquer-22850011001_0_0.jpg?itok=3b2ftHzc
Baby, Take a Vow
                                        
https://www.opi.com/sites/default/files/styles/product_shelf_image/public/suzi-without-a-paddle-nlf88-nail-lacquer-22006698188_21_0.jpg?itok=mgi1-rz3
https://www.opi.com/sites/default/files/styles/product_photos/public/suzi-without-a-paddle-nlf88-nail-lacquer-22006698188_21_0.jpg?itok=mgi1-rz3
Suzi Without a Paddle
                                        
https://www.opi.com/sites/default/files/styles/product_shelf_image/public/coconuts-over-opi-nlf89-nail-lacquer-22006698189_24_1_0.jpg?itok=yasOZA4l
https://www.opi.com/sites/default/files/styles/product_photos/public/coconuts-over-opi-nlf89-nail-lacquer-22006698189_24_1_0.jpg?itok=yasOZA4l
Coconuts Over OPI
                                        
https://www.opi.com/sites/default/files/styles/product_shelf_image/public/no-tan-lines-nlf90-nail-lacquer-22006698190_20_1_0.jpg?itok=ot_cu8c5
https://www.opi.com/sites/default/files/styles/product_photos/public/no-tan-lines-nlf90-nail-lacquer-22006698190_20_1_0.jpg?itok=ot_cu8c5
No Tan Lines
                                        


...and so on.

编辑:要将图像保存到磁盘,可以使用以下脚本:

import requests
from bs4 import BeautifulSoup

theurl = "https://www.opi.com/shop-products/nail-polish-powders/nail-lacquer"
thepage = requests.get(theurl)
soup = BeautifulSoup(thepage.content, "html.parser")

i = 1
for img in soup.select('#all_nail_lacquer [typeof="foaf:Image"][data-src]'):
    u = img['data-src'].replace('shelf_image', 'photos')
    with open('img_{:04d}.jpg'.format(i), 'wb') as f_out:
        print('Saving {}'.format(u))
        f_out.write(requests.get(u).content)
    i += 1

相关问题 更多 >