如何在遍历url时使用seleniumpython查找web元素问题的回答

如何在遍历url时使用seleniumpython查找web元素

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我需要通过循环和刮一个元素（相同的类名为所有网页）从一百万个网页。我已按以下（简化）方式设置代码： <pre><code>driver = webdriver.Firefox() wait = WebDriverWait(driver, 10) detail_dict = {} for i in range(1000000): url = f'http://www.cnappc.it/risultato.aspx?IDAssociato={i}&tipo=1#edit' driver.get(url) elem_detail = wait.until(expected_conditions .presence_of_element_located((By.CLASS_NAME, 'content'))) detail_dict[i] = elem_detail.text </code></pre> 代码运行得相当平稳，当我中断内核进行检查时，我注意到每次迭代的<code>i</code>和<code>url</code>都在增加。但是，驱动程序网页在第一个URL上被“卡住”，即<a href="http://www.cnappc.it/risultato.aspx?IDAssociato=0&tipo=1#edit" rel="nofollow noreferrer">http://www.cnappc.it/risultato.aspx?IDAssociato=0&tipo=1#edit</a>，因此<code>elem_detail.text</code>反复返回相同的字符串。似乎驱动程序网页无法跟上<code>driver.get(url)</code>方法，尽管<code>.get()</code>等待页面完全加载。你知道吗 从<a href="https://selenium-python.readthedocs.io/getting-started.html" rel="nofollow noreferrer">Selenium-Python/Getting Started</a>： <blockquote> The driver.get method will navigate to a page given by the URL. WebDriver will wait until the page has fully loaded (that is, the “onload” event has fired) before returning control to your test or script. </blockquote> 我为<code>elem_detail</code>插入了一个预期条件，但没有结果。在<code>driver.get(url)</code>之后设置<code>time.sleep(2)</code>允许驱动程序网页更改和显示不同的内容，但这样我将面临严重的减速。即使这样，页面也会时不时地卡住，字典值条目最终会无系统地重复。你知道吗 您能否推荐一种不涉及<code>time.sleep()</code>的健壮方法？你知道吗 <hr/> 仅供参考：我使用硒与壁虎河。你知道吗

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

如何在遍历url时使用seleniumpython查找web元素

1 个回答

相关Python问题