如何使用多处理/线程来运行多个请求实例\u html的网页呈现程序?

2024-09-30 22:23:52 发布

您现在位置:Python中文网/ 问答频道 /正文

我想从服务器呈现的网页中获取链接,因此我需要使用requests\u html和render webpage从中提取链接

现在假设我想要10个网页中的10个链接,它一个接一个地工作,首先呈现网页,然后提取链接,这很耗时

我想做的是通过同时运行多处理/线程化功能的不同实例来提取所有链接

因此,我尝试了以下方法:

   download_links = []

   def getDownloadLinks(url):
       session = HTMLSession()
       page = session.get(url)
       page.html.render(timeout=0)
       link = page.find('#zmovie-view', first=True).find('video', first=True).attrs['src']
       download_links.append(link)
 

    links = ['https://animehd47.com/jujutsu-kaisen-tv/s2-m1/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m2/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m3/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m4/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m5/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m6/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m7/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m8/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m9/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m10/']
    
    threads = []
    for i in range(len(links)):
        process = multiprocessing.Process(target=getDownloadLinks, args=(links[i],))
        process.start()
        threads.append(process)

    for t in threads:
        t.join()

但它不返回任何内容,而是抛出多个错误。 我查看了Google,我能得到的是它与asyncio的关系,它永远无法成功完成循环迭代

到底是什么问题


Tags: httpscom网页链接htmlpagelinkstv
1条回答
网友
1楼 · 发布于 2024-09-30 22:23:52

从我的测试来看,这段代码似乎是解决您问题的最佳方案:

import multiprocessing
from requests_html import HTMLSession


def getDownloadLinks(url, returnvar, i):
    try:
        session = HTMLSession()
        page = session.get(url)
        page.html.render(timeout=0)
        link = page.html.find('#zmovie-view', first=True).find('video', first=True).attrs['src']
        returnvar[str(i)] = link
        page.close()
        session.close()
    except Exception as e:
        print(e)


links = ['https://animehd47.com/jujutsu-kaisen-tv/s2-m1/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m2/',
         'https://animehd47.com/jujutsu-kaisen-tv/s2-m3/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m4/',
         'https://animehd47.com/jujutsu-kaisen-tv/s2-m5/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m6/',
         'https://animehd47.com/jujutsu-kaisen-tv/s2-m7/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m8/',
         'https://animehd47.com/jujutsu-kaisen-tv/s2-m9/', 'https://animehd47.com/jujutsu-kaisen-tv/s2-m10/']


if __name__ == '__main__':
    threads = []
    manager = multiprocessing.Manager()
    returndict = manager.dict()
    for i in range(len(links)):
        try:
            process = multiprocessing.Process(target=getDownloadLinks, args=(links[i], returndict, i))
            process.start()
            threads.append(process)
        except Exception as e:
            print(e)
    for t in threads:
        t.join()
    print(returndict)

这段代码使用一个多处理管理器来正确地从worker函数返回值,并且它似乎不再引起错误,至少对我来说是这样。希望这有帮助,如果你有任何问题,请务必在下面发表评论

相关问题 更多 >