Seleniumdriver下载的文件被截断了

2024-06-24 13:30:40 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用带有seleniumwebdriver的python脚本下载一个网页,但它一直抛出valueError异常,导致下载页被截断。你知道吗

当网页上有一些字符(如逗号、连字符…)时,文件似乎被截断了。你知道吗

代码:

    from pip.cmdoptions import global_options
    from selenium import webdriver
    from pyvirtualdisplay import Display
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC

    def contactbrowser(httppath,iterater):
        display = Display(visible=0, size=(800, 600))
        display.start()
        driver = webdriver.Firefox()#firefox_profile=fp)
        wd=driver.get(httppath)
        driver.maximize_window()
        try:
            element = WebDriverWait(driver, 10).until(
                EC.presence_of_element_located((By.TAG_NAME, "html"))
            )
            ele=driver.find_element_by_tag_name("h1")
            header1=ele.get_attribute("innerHTML")
            fullpath1=header1
            file = open("1/"+fullpath1+".html", "w")
            for ss in driver.page_source:
                file.write(bytearray([ord(ss)]))
            file.close()
            driver.close()

        except ValueError:
            print "Value error", httppath
            driver.close()
        except TypeError:
            driver.close()
        except:
            driver.close()

    list= []
    fileloc = open("file.txt", "r")
    line = fileloc.readline()
    while line:
        list.append(line)
        line = fileloc.readline()
    fileloc.close()
    count=0
    i=0
    while count<list.__len__():
        contactbrowser(list[count],i)
        count=count+1
        i=i+1

例如:下载this page会导致文件被截断。你知道吗

Image

编辑:当遇到一个没有相应ASCII码的值时,就会出现问题。在上一个示例中,单词“first”在文本的某处被写成“first”,这导致下载中断,导致文件被截断。你知道吗


Tags: 文件fromimportclosedriverseleniumcountline