如何解析网站中的文本，这些文本在单击按钮后显示其他文本，但该文本不在基本文本中

1条回答

网友

1楼 · 发布于 2024-10-01 09:36:39

您可以使用requests+BeautifulSoup方法。当您单击More blogs按钮并向下滚动页面时，您只需要模拟到服务器的底层请求。你知道吗

下面是从http://hypem.com/blogs页面打印所有博客文章图像标题的代码：

from bs4 import BeautifulSoup
import requests


def extract_blogs(content):
    first_page = BeautifulSoup(content)
    for link in first_page.select('div.directory-blog img'):
        print link.get('title')

# extract blogs from the main page
response = requests.get('http://hypem.com/blogs')
extract_blogs(response.content)

# paginate over rest results until there would be an empty response
page = 2
url = 'http://hypem.com/inc/serve_sites.php?featured=true&page={page}'

while True:
    response = requests.get(url.format(page=page))
    if not response.content.strip():
        break
    extract_blogs(response.content)
    page += 1

印刷品：

Heart and Soul
Avant-Avant
Different Kitchen
Ladywood 
Orange Peel
Phonographe Corp
...
Stadiums & Shrines
Caipirinha Lounge
Gorilla Vs. Bear
ISO50 Blog
Fluxblog
Music ( for robots)

希望这至少能让你对如何在这种情况下刮取网页内容有一个基本的想法。你知道吗

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何解析网站中的文本，这些文本在单击按钮后显示其他文本，但该文本不在基本文本中

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >