从中刮取图像和标题https://www.open2study.com/courses使用python和beatifulsoup

from bs4 import BeautifulSoup import urllib r = urllib.urlopen('https://www.open2study.com/courses').read() soup = BeautifulSoup(r) links = soup.find('figure').find_all('img', src=True) for link in links: txt = open('test.txt' , "w") link = link["src"].split("src=")[-1] download_img = urllib.urlopen('https://www.open2study.com/courses') txt.write(download_img.read()) txt.close()

2条回答

网友

1楼 · 编辑于 2024-10-03 13:24:57

像这样？在

import urllib
from bs4 import BeautifulSoup

titles = []
images = []

r = urllib.urlopen('https://www.open2study.com/courses').read()
soup = BeautifulSoup(r)

for i in soup.find_all('div', {'class': "courses_adblock_rollover"}):
    titles.append(i.h2.text)

for i in soup.find_all(
    'img', {
        'class': "image-style-course-logo-subjects-block"}):
    images.append(i.get('src'))

with open('test.txt', "w") as f:
    for i in zip(titles, images):
        f.write(i[0].encode('ascii', 'ignore') +
                '\n'+i[1].encode('ascii', 'ignore') +
                '\n\n')

网友

2楼 · 编辑于 2024-10-03 13:24:57

您可以直接用beautifulsoup获取{}，而不是执行split

使用此命令可获取包含标题和图像的div

for link in soup.find_all("div",attrs={"class" : "courses_adblock_start"}):

然后使用这个来获取每个div中的标题和图像：

^{pr2}$

您还可以在每次循环中打开页面（如果希望避免），只需打开一次，然后将其用于循环，如下所示：

url = "http://www.open2study.com/courses" 
page = urllib.request.urlopen(url)
soup = BeautifulSoup(page.read())

for link in soup.find_all("div",attrs={"class" : "courses_adblock_start"}):
    try:
        print("Title : " + link.find("h2",attrs={"class":"adblock_course_title"}).get_text())
        print("Image : " + link.find("img", attrs={"class":"image-style-course-logo-subjects-block"}).get("src"))
    except:
        print("error")

以下是新的输出：

Title : World Music
Image : https://www.open2study.com/sites/default/files/styles/course_logo_subjects_block/public/Course%20Tile_world_music.jpg?itok=CG6pvXHp
Title : Writing for the Web
Image : https://www.open2study.com/sites/default/files/styles/course_logo_subjects_block/public/3_writing_for_web_C_0.jpg?itok=exQApr-1

相关问题更多 >

编程相关推荐

热门问题

热门文章