对beauthulsoup来说比较新,我正在尝试从这个网页提取数据:http://reports.workforce.test.ohio.gov/program-county-wia-reports.aspx?name=GTL8gAmmdulY5GSlycy7WQ==&dataType=hIp9ibmBIwbKor1WvT5Bkg==&dataTypeText=hIp9ibmBIwbKor1WvT5Bkg==#
我想抓取标题下的数字“程序完成者”,“就业第二季度”等。html代码的相关部分是:
<ul class="listbox">
<li class="li1">
<p style="cursor:help" class="listtop" title="WIA Adult
completers are those individuals who have exited a WIA Adult program from
which the individual received a core staff-assisted service (such as job
search or placement assistance) or an intensive service (such as
counseling, career planning, or job training). Those individuals who
participated in WIA through self-service, like OhioMeansJobs.com, or other
less intensive programs are not included in the dashboard.">Program
Completers</p>
<p id="programcompleters1">18</p></li>
我想要字符串“程序完成者”和“18”。我尝试过实现这些解决方案here,here,和here,但没有多少运气。我的代码有一个版本是:
^{pr2}$这将返回文本,但网页的其他部分也标记为“ul”。我没有成功地从图表区内抓取任何文本。如何检索我想要的文本?
谢谢你的帮助!
如前所述,您要查找的数据位于iframe中,请按照@choosed_codex的说明访问它:
http://reports.workforce.test.ohio.gov/WIAReports/WIA_COUNTY.ASPX?level=county&DataType=hIp9ibmBIwbKor1WvT5Bkg==&name=GTL8gAmmdulY5GSlycy7WQ==&programDate=Kf/2jvCFFRgQJjODWV7l08ATxxM/adw9p1FWfZ9J7O8=
然后,您可以通过以下方式访问感兴趣的字段:
你想要的元素在iframe中。尝试从http://reports.workforce.test.ohio.gov/WIAReports/WIA_COUNTY.ASPX?level=county&DataType=hIp9ibmBIwbKor1WvT5Bkg==&name=GTL8gAmmdulY5GSlycy7WQ==&programDate=Kf/2jvCFFRgQJjODWV7l08ATxxM/adw9p1FWfZ9J7O8=处的页面本身提取
所以,这应该行得通
相关问题 更多 >
编程相关推荐