附加项多重处理

2024-07-05 09:12:43 发布

男 | 程序猿一只，喜欢编程写python代码。

在函数get\u links中，我获取url的链接。在Scrape函数中，我使用html函数中的文本获取每个URL的内容（不在代码中）。我想将url和visible_text附加到两个列表中，其中包含每个url的url和可见文本。在这里，列表只包含一个项目，上一个项目将被替换。我想保留以前的值。我得到的输出是：

['https://www.scrapinghub.com']
['https://www.goodreads.com/quotes']

我需要一张单子

def get_links(url):
        visited_list.append(url)
        try:
            source_code = requests.get(url)
        except Exception:
            get_links(fringe.pop(0))
        plain_text = source_code.text
        soup = BeautifulSoup(plain_text,"lxml")
        for link in soup.findAll(re.compile(r'(li|a)')):
            href = link.get('href')
            if (href is None) or (href in visited_list) or (href in fringe) or (('http://' not in href) and ('https://' not in href)):
                continue
            else:
                subs = href.split('/')[2]
                fstr = repr(fringe)
                if subs in fstr:
                    continue
                else:
                    if('blah' in href):
                        if('www' not in href):
                            href = href.split(":")[0] + ':' + "//" + "www." + href.split(":")[1][2:]
                            fringe.append(href)
                        else:
                            fringe.append(href)

        return fringe

def test(url):
    try:
        res = requests.get(url)
        plain_text = res.text
        soup = BeautifulSoup(plain_text,"lxml")
        visible_text = text_from_html(plain_text)
        URL.append(url)
        paragraph.append(visible_text)
    except Exception:
        print("CHECK the URL {}".format(url))

if __name__ == "__main__":
    p = Pool(10)
    p.map(test,fringe)
    p.terminate()
    p.join()

Tags：函数 text in https url get if www

0条回答

目前没有回答

附加项多重处理

相关问题更多 >

编程相关推荐

热门问题

热门文章

附加项多重处理

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >