刮擦时有没有办法绕过Python3的囊袋保护？

import bs4 from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup my_url = 'https://www.immoweb.be/fr/recherche/immeuble-de-rapport/a-vendre' # opening up connection, grabbing the page uClient = uReq(my_url) page_html = uClient.read() uClient.close() #html parsing page_soup = soup(page_html, "html.parser") page_soup.h1

<html style="height:100%"><head><meta content="NOINDEX, NOFOLLOW" name="ROBOTS"/><meta content="telephone=no" name="format-detection"/> [...] Request unsuccessful. Incapsula incident ID: 936002200207012991-

1条回答

网友

1楼 · 发布于 2024-09-19 23:38:37

我做了一些这里描述的测试Getting ‘wrong’ page source when calling url from python，只有{a2}的解决方案可行。在

参见以下示例：

from bs4 import BeautifulSoup
from selenium import webdriver

url = 'https://www.immoweb.be/fr/recherche/immeuble-de-rapport/a-vendre'

driver = webdriver.Chrome(executable_path='./chromedriver')
driver.get(url)

soup = BeautifulSoup(driver.page_source, features='html.parser')
driver.quit()

print(soup.prettify())

输出：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章