<p>正如Eric指出的,这个表是由JavaScript填充的</p>
<p>然而,通过使用Chrome的开发工具,可以很容易地截获页面内部正在进行的API调用</p>
<p>转到“网络”选项卡并按XHR进行筛选,您将找到页面正在调用的端点,即</p>
<p><a href="http://gsa.nic.in/gsaservice/services/service.svc/gsastatereport?schemecode=PMJDY" rel="nofollow noreferrer">http://gsa.nic.in/gsaservice/services/service.svc/gsastatereport?schemecode=PMJDY</a></p>
<p><a href="https://i.stack.imgur.com/mYnzY.png" rel="nofollow noreferrer"><img src="https://i.stack.imgur.com/mYnzY.png" alt="enter image description here"/></a>
然后,像这样一个简单的脚本将为您提供格式良好的数据</p>
<pre><code>import json
import pandas as pd
import requests
r = requests.get('http://gsa.nic.in/gsaservice/services/service.svc/gsastatereport?schemecode=PMJDY')
data = json.loads(r.json()['d'])
pd.DataFrame(data[0]['data'])
LGDStateCode StateName totalSaturatedVillage villageSaturatedTillDate TotalBeneficiaries TotalBeneficiariesRegisteredTillDate Saturation
0 28 ANDHRA PRADESH 305 305 27238 27238 100.00
1 12 ARUNACHAL PRADESH 299 283 42331 39999 94.49
2 18 ASSAM 3042 2375 648815 621878 95.85
3 10 BIHAR 635 544 92356 90131 97.5
</code></pre>