Python抓取网页 - 问答 - Python中文网

Python抓取网页

2024-09-22 16:23:07 发布

您现在位置：Python中文网/ 问答频道 /正文

男 | 程序猿一只，喜欢编程写python代码。

我试着从一个网页一行一行地提取链接和它们的文本，并将文本和链接插入词典。不用靓汤或正则表达式。在

我一直收到这个错误：

错误：

 Traceback (most recent call last):
 File "F:/Homework7-2.py", line 13, in <module>
 link2 = link1.split("href=")[1]
 IndexError: list index out of range

代码：

^{pr2}$

Tags： in py 文本网页 most 链接错误 line

1条回答

网友

1楼 · 发布于 2024-09-22 16:23:07

import requests

from bs4 import BeautifulSoup

r = requests.get("http://stackoverflow.com/questions/29336915/python-scraping-webpages")
#  find all a tags with href attributes
for a in BeautifulSoup(r.content).find_all("a",href=True):
    # print each href
    print(a["href"])

显然，这是一个非常广泛的例子，但会让你开始，如果你想要特定的网址，你可以缩小你的搜索某些元素，但这将是不同的所有网页。没有比requests和BeautifulSoup更容易用于解析的工具了

相关问题更多 >

编程相关推荐

热门问题

热门文章