解析htm时在python中设置一个简单的重试函数

2024-09-20 06:43:53 发布

您现在位置:Python中文网/ 问答频道 /正文

如果这个函数在页面上找不到信息,我会让它自己重新运行。你知道吗

我原以为这是个解决办法,但行不通。我不确定如何使用简单的功能实现刮伤循环。我试过使用retrying模块,但是它在安装时有问题,所以硬代码解决方案是理想的。你知道吗

我的代码在下面

import time, requests, webbrowser, sys, os, re, json
from bs4 import BeautifulSoup
from colorama import Fore, Back, Style, init
import subprocess as s

url = "http://notimportant.com"

r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")

def getIds():
    global product_id
    for script in scripts:
        if 'spConfig =' in script.getText():
            #idlive = True
            regex = re.compile(r'var spConfig = new Product.Config\((.*?)\);')
            match = regex.search(script.getText())
            spConfig = json.loads(match.groups()[0])
            for key, attribute in spConfig['attributes'].iteritems():
                for option in attribute['options']:
                    if option['label_uk'] == size:
                        label = option['label_uk'].strip()
                        for product_id in option['products']:
                            print(Fore.CYAN + "Size Found!")
                            print product_id, "-", label
                            #str = product_id
                            #productsizeid = str
        else:
            print(Fore.RED + "Sizes not live yet")
            print("Retrying in 10 seconds . . .")
            time.sleep(10)
            print("Trying again. . .")
            getIds()

Tags: 代码inimportreidfortimescript
1条回答
网友
1楼 · 发布于 2024-09-20 06:43:53

迭代将是首选方法 比如:

url = "http://notimportant.com"
size_alive  = false
while not size_alive:
           do_the_scraping_function(#the function should set size_alive=true when it finds spConfig =' in script.getText())
           print("retrying in 10 seconds")
           time.sleep(10)

相关问题 更多 >