我尝试用BS4 python来抓取动态网站:
https://www.nadlan.gov.il/?search=%D7%AA%D7%9C%20%D7%90%D7%91%D7%99%D7%91%20%20%D7%99%D7%A4%D7%95
我试过:
from urllib.request import urlopen
from bs4 import BeautifulSoup
page = urlopen(wiki)
soup = BeautifulSoup("https://www.nadlan.gov.il/?search=תל אביב יפו")
我有两个问题:
网站是动态的,当我查看页面源代码时,我看不到页面内容,只有JavaScript脚本:
<script>
document.write("<script src='scripts/dis/bundleJS.js?v=" + globalAppVersion + "'><\/script>")
document.write("<script id='srcGovmap' src='https://new.govmap.gov.il/govmap/api/govmap.api.js?v='" + globalAppVersion + "'><\/script>")
document.write("<script src='MainLoader.js?v=" + globalAppVersion + "'><\/script>")
document.write("<script id='tld-search-srcipt'
src='https://www.nadlan.gov.il/TldSearch/Scripts/ac.js?v=" + globalAppVersion + "'><\/script>");
</script>
<script src="scripts/dis/accessibility/b1.js?v=3" type="text/javascript"></script>
<script type="text/javascript">
accessibility_rtl = true;
pixel_from_side = 20;
pixel_from_start = 15;
$(document).ready(function () {
$('#accessibility_icon').attr('src', 'images/accessibility_icon.png')
$('.accessibility_div_wrap>.btn_accessibility > span.accessibility_component').html('')
});
当我打开站点时,数据加载需要几秒钟:
如何通过硒解决这些问题
数据通过JavaScript动态加载。您可以使用
requests
/json
模块模拟Ajax调用。例如:印刷品:
相关问题 更多 >
编程相关推荐