我正在尝试使用python3为我的wordpress网站创建一个爬虫程序

import requests from bs4 import BeautifulSoup def page(current_page): current = "h2" while current == current_page: url = 'https://vishrantkhanna.com/?s=' + str(current) source_code = requests.get(url) plain_text = source_code.txt soup = BeautifulSoup(plain_text) for link in soup.findAll('h2', {'class': 'entry-title'}): href = "https://vishrantkhanna.com/" + link.get('href') title = link.string print(href) print(title) page("h2")

1条回答

网友

1楼 · 发布于 2024-09-30 16:23:41

您需要从标题中提取<a>标记：

import requests
from bs4 import BeautifulSoup

URL = 'https://vishrantkhanna.com/?s=1'

html = requests.get(URL).text
bs = BeautifulSoup(html, 'html.parser')
for link in bs.find_all('h2', {'class': 'entry-title'}):
    a = link.find('a', href=True)
    href = "https://vishrantkhanna.com/" + a.get('href')
    title = link.string
    print(href)
    print(title)

编程相关推荐

JavaAkka参与者工具包上下文。ActorofVS系统。阿克特罗夫
java快速查看是否未选中所有复选框的方法
使用JLabel添加图片时遇到的java问题
java如何在SpringRestTemplate中自定义自动封送以生成/修改XML头（编码，DOCTYPE）
java Exchange Web服务（EWS）使用令牌凭据进行单点登录？
java无法从@Transaction中具有关系的两个表中删除
多线程处理我的代码只有在通过java完成处理后才能继续
java FileNotFoundException:[excel在本地计算机上的位置]文件名、目录名或卷标语法不正确
java JPanel不会显示在另一个JPanel之上
从Eclipse构建器运行Java程序

相关问题更多 >

编程相关推荐

热门问题

热门文章

我正在尝试使用python3为我的wordpress网站创建一个爬虫程序

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >