我试图用带有seleniumwebdriver的python脚本下载一个网页,但它一直抛出valueError异常,导致下载页被截断。你知道吗
当网页上有一些字符(如逗号、连字符…)时,文件似乎被截断了。你知道吗
代码:
from pip.cmdoptions import global_options
from selenium import webdriver
from pyvirtualdisplay import Display
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
def contactbrowser(httppath,iterater):
display = Display(visible=0, size=(800, 600))
display.start()
driver = webdriver.Firefox()#firefox_profile=fp)
wd=driver.get(httppath)
driver.maximize_window()
try:
element = WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.TAG_NAME, "html"))
)
ele=driver.find_element_by_tag_name("h1")
header1=ele.get_attribute("innerHTML")
fullpath1=header1
file = open("1/"+fullpath1+".html", "w")
for ss in driver.page_source:
file.write(bytearray([ord(ss)]))
file.close()
driver.close()
except ValueError:
print "Value error", httppath
driver.close()
except TypeError:
driver.close()
except:
driver.close()
list= []
fileloc = open("file.txt", "r")
line = fileloc.readline()
while line:
list.append(line)
line = fileloc.readline()
fileloc.close()
count=0
i=0
while count<list.__len__():
contactbrowser(list[count],i)
count=count+1
i=i+1
例如:下载this page会导致文件被截断。你知道吗
编辑:当遇到一个没有相应ASCII码的值时,就会出现问题。在上一个示例中,单词“first”在文本的某处被写成“first”,这导致下载中断,导致文件被截断。你知道吗
目前没有回答
相关问题 更多 >
编程相关推荐