2024-10-01 07:30:58 发布
网友
我正在尝试下载此网页上显示的表中的所有条目-https://udhonline.rajasthan.gov.in/Portal/AuctionList 有一些按钮可以加载表中的下一个条目,但网页的链接保持不变。 我想下载Python中的所有数据,我尝试了以下方法:
pd.read_html(link)
这将生成一个列表,其中包含表中的前30个结果,以及另一个包含所有30个结果的项目。页面上仅显示前30个结果的默认设置。如何从以下所有页面获取数据
您可以使用此示例来说明如何将多个页面中的数据加载到dataframe中:
import requests import pandas as pd from bs4 import BeautifulSoup api_url = "https://udhonline.rajasthan.gov.in/Portal/SearchAuctionGrid" params = { "page": "1", "Paging": "True", "pageSize": "30", "TabViewType": "0", "UnitId": "0", } dfs = [] for page in range(1, 4): # < increase number of pages here params["page"] = page soup = BeautifulSoup( requests.post(api_url, params=params).content, "html.parser" ) for t in soup.select("table:not(:has(table))"): dfs.append(pd.read_html(str(t))[0].T) df = pd.concat(dfs).reset_index(drop=True) print(df) df.to_csv("data.csv", index=False)
印刷品:
0 1 2 3 4 5 6 0 AROGYA NAGAR RESIDENTIAL PLOT NO. 220 [9263] Scheme Name: Arogya Nagar Property Number: 220 EMD Deposit Start Date: 01-Jun-2021 08:00 AM EMD Deposit End Date: 06-Jun-2021 11:59 PM EMD Deposit Ends In: Assessed Property Value as per Bid Start Price... 1 AROGYA NAGAR RESIDENTIAL PLOT NO. 220 [9263] Scheme Name: Arogya Nagar Property Number: 220 EMD Deposit Start Date: 01-Jun-2021 08:00 AM EMD Deposit End Date: 06-Jun-2021 11:59 PM EMD Deposit Ends In: Assessed Property Value as per Bid Start Price... 2 AROGYA NAGAR RESIDENTIAL PLOT NO. 220 [9263] Scheme Name: Arogya Nagar Property Area: 2118.32 Square Feet Bid Start Date: 03-Jun-2021 10:00 AM Bid End Date: 07-Jun-2021 11:00 AM Bid Ends In: Assessed Property Value as per Bid Start Price... 3 AROGYA NAGAR RESIDENTIAL PLOT NO. 220 [9263] Scheme Name: Arogya Nagar Property Area: 2118.32 Square Feet Bid Start Date: 03-Jun-2021 10:00 AM Bid End Date: 07-Jun-2021 11:00 AM Bid Ends In: Assessed Property Value as per Bid Start Price... 4 AROGYA NAGAR RESIDENTIAL PLOT NO. 220 [9263] Scheme Name: Arogya Nagar Usage Type: Residential EMD Amount (Rs.): 211900.00 View Details Participate NaN Assessed Property Value as per Bid Start Price... 5 AROGYA NAGAR RESIDENTIAL PLOT NO. 220 [9263] Scheme Name: Arogya Nagar Usage Type: Residential EMD Amount (Rs.): 211900.00 View Details Participate NaN Assessed Property Value as per Bid Start Price... 6 CHANAKYAPURI RESIDENTIAL PLOT NO. 14 [9262] Scheme Name: Chanakyapuri Property Number: 14 EMD Deposit Start Date: 01-Jun-2021 08:00 AM EMD Deposit End Date: 06-Jun-2021 11:59 PM EMD Deposit Ends In: Assessed Property Value as per Bid Start Price... 7 CHANAKYAPURI RESIDENTIAL PLOT NO. 14 [9262] Scheme Name: Chanakyapuri Property Number: 14 EMD Deposit Start Date: 01-Jun-2021 08:00 AM EMD Deposit End Date: 06-Jun-2021 11:59 PM EMD Deposit Ends In: Assessed Property Value as per Bid Start Price... ...
并保存data.csv(LibreOffice的屏幕截图):
data.csv
我在第页做了列举
curl "https://udhonline.rajasthan.gov.in/Portal/SearchAuctionGrid?page=2&Paging=True&pageSize=30&TabViewType=0&UnitId=0" -H "Content-Type: application/x-www-form-urlencoded; charset=UTF-8" -H "X-Requested-With: XMLHttpRequest" -H "Origin: https://udhonline.rajasthan.gov.in" -H "Connection: keep-alive" -H "Referer: https://udhonline.rajasthan.gov.in/Portal/AuctionList" data-raw "X-Requested-With=XMLHttpRequest"
因此,只需向 https://udhonline.rajasthan.gov.in/Portal/SearchAuctionGrid?page=2&Paging=True&pageSize=30&TabViewType=0&UnitId=0
https://udhonline.rajasthan.gov.in/Portal/SearchAuctionGrid?page=2&Paging=True&pageSize=30&TabViewType=0&UnitId=0
用这个标题 X-Requested-With=XMLHttpRequest 还有这个身体 X-Requested-With=XMLHttpRequest
X-Requested-With=XMLHttpRequest
向此url发出post请求 https://udhonline.rajasthan.gov.in/Portal/SearchAuctionGrid
用这个标题
有了这些数据
PageSize=50&UnitId=&X-Requested-With=XMLHttpRequest
您可以使用此示例来说明如何将多个页面中的数据加载到dataframe中:
印刷品:
并保存
data.csv
(LibreOffice的屏幕截图):我在第页做了列举
curl "https://udhonline.rajasthan.gov.in/Portal/SearchAuctionGrid?page=2&Paging=True&pageSize=30&TabViewType=0&UnitId=0" -H "Content-Type: application/x-www-form-urlencoded; charset=UTF-8" -H "X-Requested-With: XMLHttpRequest" -H "Origin: https://udhonline.rajasthan.gov.in" -H "Connection: keep-alive" -H "Referer: https://udhonline.rajasthan.gov.in/Portal/AuctionList" data-raw "X-Requested-With=XMLHttpRequest"
因此,只需向
https://udhonline.rajasthan.gov.in/Portal/SearchAuctionGrid?page=2&Paging=True&pageSize=30&TabViewType=0&UnitId=0
用这个标题
X-Requested-With=XMLHttpRequest
还有这个身体X-Requested-With=XMLHttpRequest
向此url发出post请求 https://udhonline.rajasthan.gov.in/Portal/SearchAuctionGrid
用这个标题
X-Requested-With=XMLHttpRequest
有了这些数据
PageSize=50&UnitId=&X-Requested-With=XMLHttpRequest
相关问题 更多 >
编程相关推荐