通过ChromeDriver启动的Chrome浏览器被检测到

2024-09-28 22:10:04 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试在python中使用seleniumchromedriver作为网站www.mouser.co.uk。然而,它从第一个镜头就被检测到是机器人。在

enter image description here

有人对此有什么解释吗?。以下是我使用的代码:

options = Options()
options.add_argument("--start-maximized")
browser = webdriver.Chrome('chromedriver.exe',chrome_options=options)
wait = WebDriverWait(browser, 30)
browser.get('https://www.mouser.co.uk')

Tags: 代码browseradd网站www机器人argumentstart
1条回答
网友
1楼 · 发布于 2024-09-28 22:10:04

我尝试过用特定的访问url https://www.mouser.co.uk/chrome.选项但确实被检测到并被重定向到原谅我们的中断页面。在

  • 代码块:

    from selenium import webdriver
    from selenium.webdriver.chrome.options import Options
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    
    options = Options()
    options.add_argument("start-maximized")
    options.add_argument("disable-infobars")
    options.add_argument(" disable-extensions")
    driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
    driver.get("https://www.mouser.co.uk")
    myElement = WebDriverWait(driver, 30).until(EC.element_to_be_clickable((By.XPATH, "//a[@id='1_lnkLeftFlag']")))
    driver.execute_script("arguments[0].click();", myElement)
    

现在检查原谅我们的中断页面,您会发现<body>标记包含:

  • 属性dist-GlobalHeader
  • 属性dist-PageWrap

这清楚地表明,该网站受到了机器人程序管理服务提供商Distil Networks的保护,ChromeDriver的导航被检测到并随后被阻止。在


蒸馏

根据文章There Really Is Something About Distil.it...

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

此外

"One pattern with **Selenium** was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


参考文献

您可以在Unable to use Selenium to automate Chase site login中找到相关讨论

相关问题 更多 >