https://www.realestate.com.au/ 不允许刮网?

2024-09-21 02:34:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试从https://www.realestate.com.au/中提取数据 首先,我根据要查找的属性类型创建url,然后使用SeleniumWebDriver打开url,但页面是空白的! 知道为什么会这样吗?是因为这个网站不提供网页抓取权限吗?有什么办法可以删除这个网站吗

这是我的密码:

from selenium import webdriver
from bs4 import BeautifulSoup
import time

PostCode = "2153"
propertyType = "house"
minBedrooms = "3"
maxBedrooms = "4"
page = "1"

url = "https://www.realestate.com.au/sold/property-{p}-with-{mib}-bedrooms-in-{po}/list-{pa}?maxBeds={mab}&includeSurrounding=false".format(p = propertyType, mib = minBedrooms, po = PostCode, pa = page, mab = maxBedrooms)
print(url)
# url should be "https://www.realestate.com.au/sold/property-house-with-3-bedrooms-in-2153/list-1?maxBeds=4&includeSurrounding=false"

driver = webdriver.Edge("./msedgedriver.exe") # edit the address to where your driver is located
driver.get(url)
time.sleep(3)

src = driver.page_source
soup = BeautifulSoup(src, 'html.parser')
print(soup)

Tags: fromhttpsimportcomurltime网站www

热门问题