从中刮取图像和标题https://www.open2study.com/courses使用python和beatifulsoup

2024-10-03 13:24:57 发布

您现在位置:Python中文网/ 问答频道 /正文

from bs4 import BeautifulSoup 
import urllib 

r = urllib.urlopen('https://www.open2study.com/courses').read() 

soup = BeautifulSoup(r) 
links = soup.find('figure').find_all('img', src=True) 

for link in links: 
    txt = open('test.txt' , "w") 
    link = link["src"].split("src=")[-1] 
    download_img = urllib.urlopen('https://www.open2study.com/courses') 
    txt.write(download_img.read()) 
    txt.close()

我需要从this website刮取图像和标题。在


Tags: httpsimportsrctxtcomimgreadwww
2条回答

像这样?在

import urllib
from bs4 import BeautifulSoup

titles = []
images = []

r = urllib.urlopen('https://www.open2study.com/courses').read()
soup = BeautifulSoup(r)

for i in soup.find_all('div', {'class': "courses_adblock_rollover"}):
    titles.append(i.h2.text)

for i in soup.find_all(
    'img', {
        'class': "image-style-course-logo-subjects-block"}):
    images.append(i.get('src'))

with open('test.txt', "w") as f:
    for i in zip(titles, images):
        f.write(i[0].encode('ascii', 'ignore') +
                '\n'+i[1].encode('ascii', 'ignore') +
                '\n\n')

您可以直接用beautifulsoup获取{},而不是执行split

使用此命令可获取包含标题和图像的div

for link in soup.find_all("div",attrs={"class" : "courses_adblock_start"}):

然后使用这个来获取每个div中的标题和图像:

^{pr2}$

您还可以在每次循环中打开页面(如果希望避免),只需打开一次,然后将其用于循环,如下所示:

url = "http://www.open2study.com/courses" 
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page.read())

for link in soup.find_all("div",attrs={"class" : "courses_adblock_start"}):
    try:
        print("Title : " + link.find("h2",attrs={"class":"adblock_course_title"}).get_text())
        print("Image : " + link.find("img", attrs={"class":"image-style-course-logo-subjects-block"}).get("src"))
    except:
        print("error")

以下是新的输出:

Title : World Music
Image : https://www.open2study.com/sites/default/files/styles/course_logo_subjects_block/public/Course%20Tile_world_music.jpg?itok=CG6pvXHp
Title : Writing for the Web
Image : https://www.open2study.com/sites/default/files/styles/course_logo_subjects_block/public/3_writing_for_web_C_0.jpg?itok=exQApr-1

相关问题 更多 >