报废数据来自magibricks.com网站

2024-09-28 01:31:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图从magicbricks.com网站 但当我试图通过手动单击页面底部的第二个页面来更改页面时,页面链接保持不变。我得到了同样的数据。如何加载剩余页面。在

例如: 这是第一页的链接。在

https://www.magicbricks.com/property-for-sale/residential-real-estate?bedroom=1,2,3,4,5,%3E5&proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment,Residential-House,Villa,Residential-Plot&cityName=Mumbai

第二页的链接是相同的只有页面内容的改变

https://www.magicbricks.com/property-for-sale/residential-real-estate?bedroom=1,2,3,4,5,%3E5&proptype=Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment,Residential-House,Villa,Residential-Plot&cityName=Mumbai

import pandas as pd
from pandas import ExcelWriter
import requests,re,csv
from bs4 import BeautifulSoup

for i in range(1,5):      # Number of pages plus one 

   url = "https://www.magicbricks.com/property-for-sale/residential- 
   real-estate?bedroom=1,2,3,4,5,%3E5&proptype=Multistorey- 
   Apartment,Builder-Floor-Apartment,Penthouse,Studio- 
   Apartment,Residential-House,Villa,Residential- 
   Plot&cityName=Mumbai".format(i);

   r = requests.get(url)
   soup = BeautifulSoup(r.content)

我想删除这个网站的500个条目


Tags: httpsimportcomfor链接wwwproperty页面

热门问题