无法使用python检索Javascript生成的数据

2024-10-04 01:30:48 发布

男 | 程序猿一只，喜欢编程写python代码。

我一直在尝试从这个URL:http://www.thesait.org.za/search/newsearch.asp?bst=&cdlGroupID=&txt_country=South+Africa&txt_statelist=&txt_state=&ERR_LS_20161018_041816_21233=txt_statelist%7CLocation%7C20%7C0%7C%7C0抓取数据，这一天的大部分时间我都知道我的效率非常低。我最近刚刚学会了如何浏览普通的html网站，似乎已经掌握了窍门。javascript驱动的程序已经被证明是痛苦的。你知道吗

到目前为止，我所研究的刮板——经过多个角度的研究，得出了相同的结果。下面是我使用的代码：

from bs4 import BeautifulSoup
from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait

PHANTOMJS_PATH = './phantomjs.exe'

#Using PhantomJS to navigate the url
browser = webdriver.PhantomJS(PHANTOMJS_PATH)
browser.get('http://www.thesait.org.za/search/newsearch.asp?bst=&cdlGroupID=&txt_country=South+Africa&txt_statelist=&txt_state=&ERR_LS_20161018_041816_21233=txt_statelist%7CLocation%7C20%7C0%7C%7C0')

wait = WebDriverWait(browser, 15)
# let's parse our html
soup = BeautifulSoup(browser.page_source, "html5lib")

# get all the games
test = soup.find_all('tr')

print test

我最大的问题是我找不到细节。在下图中： Highlighted field

我无法获取与该特定名称相关的URL。在得到URL之后，我想进一步导航到用户以获得更多的细节。你知道吗

所以我的问题如下：

有没有更有效的方法以编程方式返回您正在寻找的数据（给定有限的时间）。你知道吗
有没有更好的方法来查看在抓取时如何浏览javascript生成的站点？你知道吗
请让我知道如果我需要更清楚。你知道吗

谢谢！你知道吗

第二部分：

我采取了另一种方法，遇到了另一个问题。你知道吗

我尝试使用以下方法获取上面的标签：

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Chrome()  
browser.get('http://www.thesait.org.za/search/newsearch.asp?bst=&cdlGroupID=&txt_country=South+Africa')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')  
comments = soup.findAll('a')  
print comments

在我打印的“评论”列表中，我要查找的特定元素没有出现。i、 e

<a href="/members/?id=35097829" id="MiniProfileLink_35097829" onmouseover="MiniProfileLink_OnMouseOver(35097829);" onmouseout="HideMiniProfile();" target="_top">Namir Abraham</a>

然后我尝试使用selenium功能：

从selenium导入webdriver
从selenium.common.exceptions异常导入NoSuchElementException
从selenium.webdriver.common文件.keys导入密钥
从bs4导入

browser = webdriver.Chrome('C:/Users/rschilder/Desktop/Finance24 Scrape/Accountant_scraper/chromedriver.exe')
browser.get('http://www.thesait.org.za/search/newsearch.asp?bst=&cdlGroupID=&txt_country=South+Africa')  
browser.implicitly_wait(30)
html = browser.execute_script("return document.getElementsByTagName('html')[0].innerHTML")
#browser.quit()

print html

我在这方面面临的挑战是：

我不太确定如何使用selenium get功能来搜索和获取特定元素（它不像漂亮的汤那样直观）
即使使用selenium导航，我要查找的元素（上面提到的）仍然没有出现在输出中？你知道吗

Tags： from org import browser txt http search get

0条回答

目前没有回答

无法使用python检索Javascript生成的数据

相关问题更多 >

编程相关推荐

热门问题

热门文章

无法使用python检索Javascript生成的数据

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >