如何避免Python抓取时超过最大错误重试次数？

2024-05-22 09:38:02 发布

您现在位置：Python中文网/ 问答频道 /正文

862

网友

男 | 程序猿一只，喜欢编程写python代码。

在python3中，我编写了一个程序，用于在多个页面上刮取表行的内容（97893）。每一页都有一组表，所以我浏览每一页并填写一个列表

开始的页面是：http://www.portaltransparencia.gov.br/PortalComprasDiretasFavorecido.asp?TipoPesquisa=2&Ano=2017&Pagina=1

from bs4 import BeautifulSoup
import requests

def sopa(link):
    res = requests.get(link)
    soup =  BeautifulSoup(res.text, "lxml")
    table = soup.select("table")[1]
    conjunto = table.findAll("tr")
    return conjunto

planilha = []
for i in range(1,97893):
    link = "http://www.portaltransparencia.gov.br/PortalComprasDiretasFavorecido.asp?TipoPesquisa=2&Ano=2017&Pagina="
    link = link + str(i)
    conjunto = sopa(link)
    conta = 0
    for linha in conjunto:
        if conta > 0:
            documento = linha.find("td", {"class": "firstChild"}, {"style": "white-space: nowrap;"}).text.strip()
            nome = linha.find("a").text.strip()
            valor = linha.find("td", {"class": "colunaValor"}).text.strip()
            dicionario = {"documento": documento, "nome": nome, "valor": valor}
            planilha.append(dicionario)
        conta = conta + 1

但在第458页出现了这样的错误：

^{pr2}$

我相信我被推翻是因为我按顺序访问了许多页面。拜托，有人知道我该怎么避免吗？在

Tags： text http www table link 页面 find valor

0条回答

目前没有回答

如何避免Python抓取时超过最大错误重试次数？

相关问题更多 >

编程相关推荐

热门问题

热门文章

如何避免Python抓取时超过最大错误重试次数？

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >