获取链接可以使用chromedriver工作,但是使用phantomjs失败

2024-09-30 16:23:23 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图提取一个链接使用以下代码

def Soup(htmsrc):
    return BeautifulSoup(htmsrc,
                         'html.parser')

def get_html_sel(url, t=15):
    logger.info('Searching: {0}'.format(url))
    try:
        driver = webdriver.Chrome(chromedriver)
        driver.get(url)

        time.sleep(t)

        htmsrc = driver.page_source
        driver.quit()
        return (htmsrc)
    except NoSuchWindowException:
        sys.exit('The window closed unexpectedly.')

def get_filehostlink(url):
    for file_hoster_key, file_hoster_value in FILE_HOSTERS.iteritems():
        try:
            link = '{0}{1}'.format(url,file_hoster_value)
            soup = Soup(get_html_sel(link,t=15))
            return soup.find('iframe',src=re.compile(file_hoster_key))['src']
        except:
            traceback.print_exc()
            continue
        else:
            break

get_filehostlink('http://kissasian.com/Drama/Your-Lie-in-April/Movie?id=33186')

它完美地使用硒和铬驱动

然而,我发现chromedriver很麻烦。所以我决定切换到phantomjs如下:

def get_html_sel(url, t=15):
    logger.info('Searching: {0}'.format(url))
    try:
        driver = webdriver.PhantomJS()
        driver.get(url)
        time.sleep(t)
        htmsrc = driver.page_source
        driver.quit()
        return (htmsrc)
    except:
        traceback.print_exc()

但是,当我使用phantomjs时,它失败了。在get\u filehostlink函数中出现以下错误

Traceback (most recent call last):
in get_filehostlink
    return soup.find('iframe',src=re.compile(file_hoster_key))['src']
TypeError: 'NoneType' object has no attribute '__getitem__'

我做错什么了


Tags: srcformaturlgetreturndefhtmldriver