使用selenium的web scraping table只获取html元素，而不获取内容问题的回答

使用selenium的web scraping table只获取html元素，而不获取内容

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

BeautifulSoup将找不到该表，因为该表从其引用点不存在。在这里，您告诉Selenium，如果它注意到一个元素还没有出现，就暂停Selenium驱动程序匹配程序： <pre class="lang-py prettyprint-override"><code># This only works for the Selenium element matcher driver.implicitly_wait(10) </code></pre> 然后，紧接着，您获得当前HTML状态（表仍然不存在），并将其放入BeautifulSoup的解析器中。BS4将无法看到该表，即使它稍后加载，因为它将使用您刚才给它的当前HTML代码： <pre class="lang-py prettyprint-override"><code># You now move the CURRENT STATE OF THE HTML PAGE to BeautifulSoup's parser soup = BeautifulSoup(driver.page_source, 'lxml') # As this is now in BS4's hands, it will parse it immediately (won't wait 10 seconds) table = soup.find_all('table') # BS4 finds no tables as, when the page first loads, there are none. </code></pre> 要解决这个问题，您可以要求Selenium尝试获取HTML表本身。由于Selenium将使用您之前指定的<code>implicitly_wait</code>，因此它将等待它存在，然后才允许其余的代码执行持久化。此时，当BS4接收到HTML代码时，表将在那里 <pre class="lang-py prettyprint-override"><code>driver.implicitly_wait(10) # Selenium will wait until the element is found # I used XPath, but you can use any other matching sequence to get the table driver.find_element_by_xpath("/html/body/div[2]/main/div/section/div[2]/div[1]/div/div/div/div/div/div/div[2]/div[6]/div/div[2]/table/tbody/tr[1]") soup = BeautifulSoup(driver.page_source, 'lxml') table = soup.find_all('table') </code></pre> <hr/> 然而，这有点过分了。是的，您可以使用Selenium来解析HTML，但是您也可以使用<code>requests</code>模块（从您的代码中，我看到您已经导入了该模块）直接获取表数据 数据是从<a href="https://local.erstebank.hr/rproxy/webdocapi/fx/current" rel="nofollow noreferrer">this</a>端点异步加载的（您可以使用Chrome开发工具自己查找）。您可以将其与<code>json</code>模块配对，将其转换为格式良好的字典。这种方法不仅速度更快，而且资源密集度也低得多（Selenium必须打开整个浏览器窗口） <pre class="lang-py prettyprint-override"><code>from requests import get from json import loads # Get data from URL data_as_text = get("https://local.erstebank.hr/rproxy/webdocapi/fx/current").text # Turn to dictionary data_dictionary = loads(data_as_text) </code></pre>

使用selenium的web scraping table只获取html元素，而不获取内容

1 个回答

相关Python问题