我有一个page有一个表(table id=“ctl00\u ContentPlaceHolder\u ctl00\u ctl00\u GV” class="GridListings" )i need to scrape. I usually use BeautifulSoup & urllib for it,but in this case the problem is that the table takes some time to load ,so it isnt captured when i try to fetch it using BS. I cannot use PyQt4,drysracpe or windmill because of some installation issues,so the only possible way is to use Selenium/PhantomJS I tried the following,still no success:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.PhantomJS()
driver.get(url)
wait = WebDriverWait(driver, 10)
table = wait.until(EC.presence_of_element_located(By.CSS_SELECTOR, 'table#ctl00_ContentPlaceHolder_ctl00_ctl00_GV'))
上面的代码没有给出表中所需的内容。 我该如何实现这个目标
如果你想废弃一些东西,最好先安装一个web调试器(Firebug例如Mozilla Firefox)来观察你想要废弃的网站是如何工作的
接下来,您需要复制网站连接到后台的过程
正如您所说的,您想要废弃的内容正在异步加载(仅当文档准备就绪时)
假设调试器正在运行,并且您已经刷新了页面,您将在“网络”选项卡上看到以下请求:
邮政https://seahawks.strmarketplace.com/Charter-Seat-Licenses/Charter-Seat-Licenses.aspx
实现目标的最终流程是:
请参见以下工作代码:
现在,请查看“post_result.html”内容,您将找到数据
问候
您可以使用请求和bs4,获取数据,几乎所有asp站点都需要提供一些post参数,如\uu EVENTTARGET,\uu EVENTVALIDATION等:
对于实际的post,我们需要为out post数据添加更多值:
运行代码时,您将看到打印的表格
相关问题 更多 >
编程相关推荐