擅长:python、mysql、java
<p>这是一个新程序,如果当前的url已经加载超过30秒,它将获得下一个url。因为我习惯了Java,所以它看起来可能不像典型的Python程序。
从selenium import webdriver
从时间导入睡眠
从selenium.common.异常导入TimeoutException
导入csv</p>
<pre><code>profile = webdriver.FirefoxProfile()
profile.add_extension(extension = '/Users/wayne/Desktop/fourthparty-master/extension/fourthparty-jetpack.1.13.2.xpi')
driver = webdriver.Firefox(profile)
with open('top-1m.csv', 'r') as f:
reader = csv.reader(f)
fList = list(reader)
def crawl(cutoff):
for i in range(0, cutoff):
try:
driver.set_page_load_timeout(30)
getURL(i)
except:
pass
def getURL(num):
url = 'http://www.' + fList[num][1]
driver.get(url)
sleep(30)
if __name__ == "__main__":
crawl(10)
</code></pre>