检查链接中每个元素的超文本引用,并与adserver IP进行比较

2024-09-25 10:26:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在做广告程序。这个是我的代码:

from selenium import webdriver

url = input('Enter URL to detect ads from: ')

browser = webdriver.Chrome()
browser.get('http://'+url)

all_iframes = browser.find_elements_by_tag_name("iframe")
if len(all_iframes) > 0:
    print("" + "Ads Found\n")
    browser.execute_script("""
    var elems = document.getElementsByTagName("iframe"); 
    for(var i = 0, max = elems.length; i < max; i++)
         {
             elems[i].hidden=true;
         }
                      """)
    print('Total Ads: ' + str(len(all_iframes)))
else:
    print('No Ads found')

我的问题是,有没有办法检查iframe的超文本引用,并将它们与this页上的adserverip进行比较?你知道吗


Tags: from程序browserurllenvarallmax
2条回答

抱歉,我不太精通Python语法,但可以从java的角度来回答,您可以扩展到您的测试。你知道吗

访问ipAdd站点,获取页面源。你知道吗

driver.get("http://pgl.yoyo.org/as/serverlist.php?hostformat=adblockplus");
String pageSrc=driver.getpagesource(); //Get page source
List<String> ipList=pageSrc.split("\\||*\\^");Split based on start and end character

在测试网站上,获取iframe webelements并与ipAdd列表进行比较

  List<Webelement> all_iframes = driver.findElements(by.tag_name("iframe"));//Creates list of iframe webelements
 for(Webelement iframe:all_iframes){
    if(//Compare iframe.getAttribute("name") with ipaddress list){  //check whether ipaddress list contains frame name
      SOPL("Found");
    }
  }

您可以尝试以下解决方案,但我不确定这是否涵盖所有情况(我暂时无法检查):

import requests 
import sockets
from selenium import webdriver

url = input('Enter URL to detect ads from: ')

browser = webdriver.Chrome()
browser.get('http://'+url)

all_iframes = browser.find_elements_by_tag_name("iframe")

# Get IP list of ad servers with GET HTTP request (you might need to use "pip install requests")
list_of_ad_servers = requests.get('http://pgl.yoyo.org/adservers/iplist.php?ipformat=&showintro=1&mimetype=plaintext').text.split()
if len(all_iframes) > 0:
    for i in all_iframes:
        try:
            source = i.get_attribute('src')
            if source.startswith('http'):  # to get only 3rd-party links
                # Get IP of source link and check if it present in ad servers list
                if socket.gethostbyname(source.split('/')[2]) in list_of_ad_servers:
                    print('This is advertisement iframe!')
                    browser.execute_script('arguments[0].hidden=true;', i)
        except: pass

相关问题 更多 >