域阻止列表聚合器

blocklist-aggregator的Python项目详细描述


聚合阻止列表

TestingBuildPublish

License: MITPyPI - Python Version

这个python模块聚合几个广告/跟踪/恶意软件列表,并将它们合并到一个统一的列表中,删除重复项。在

请参见blocklist-domains存储库中的实现。在

目录

安装

如果您想生成自己的统一阻止列表, 使用pip命令安装此模块。在

pipinstallblocklist_aggregator

配置

请参见默认值configuration file

配置包含:

  • ads/tracking/malware URL列出要使用的模式(regex)
  • 要排除的域列表(白名单)
  • 要阻止的其他域列表(黑名单)
^{pr2}$

配置可以在运行时被覆盖。在

cfg_yaml="verbose: true"unified=blocklist_aggregator.fetch(ext_cfg=cfg_yaml)

基本示例

这个基本示例使您能够获得域的统一列表。 您可以将其保存在文件中,也可以执行任何操作。在

importblocklist_aggregatorunified=blocklist_aggregator.fetch()print(unified)["doubleclick.net",...,"telemetry.dropbox.com"]print(len(unified))152978

以详细模式获取

在调试模式下获取广告/跟踪/恶意软件URL列表

2020-10-29 06:47:17,220 Starting new HTTPS connection (1): easylist.to:443
2020-10-29 06:47:17,473 https://easylist.to:443 "GET /easylist/easylist.txt HTTP/1.1" 200 472389
2020-10-29 06:47:17,669 *** Searching valid domains...
2020-10-29 06:47:17,710 *** domains=23672 duplicated=10.26%
2020-10-29 06:47:17,710 Starting new HTTPS connection (1): raw.githubusercontent.com:443
2020-10-29 06:47:18,013 https://raw.githubusercontent.com:443 "GET /paulgb/BarbBlock/master/BarbBlock.txt HTTP/1.1" 200 4701
2020-10-29 06:47:18,020 *** Searching valid domains...
2020-10-29 06:47:18,022 *** domains=550 duplicated=1.09%
2020-10-29 06:47:18,027 Starting new HTTPS connection (1): winhelp2002.mvps.org:443
2020-10-29 06:47:19,065 https://winhelp2002.mvps.org:443 "GET /hosts.txt HTTP/1.1" 200 87636
2020-10-29 06:47:19,343 *** Searching valid domains...
2020-10-29 06:47:19,413 *** domains=11730 duplicated=1.49%
2020-10-29 06:47:19,413 Starting new HTTPS connection (1): adaway.org:443
2020-10-29 06:47:19,654 https://adaway.org:443 "GET /hosts.txt HTTP/1.1" 200 48536
2020-10-29 06:47:19,666 *** Searching valid domains...
2020-10-29 06:47:19,714 *** domains=9002 duplicated=7.52%
2020-10-29 06:47:19,721 Starting new HTTPS connection (1): raw.githubusercontent.com:443
2020-10-29 06:47:20,193 https://raw.githubusercontent.com:443 "GET /StevenBlack/hosts/master/data/StevenBlack/hosts HTTP/1.1" 200 21683
2020-10-29 06:47:20,201 *** Searching valid domains...
2020-10-29 06:47:20,212 *** domains=2938 duplicated=0.0%
2020-10-29 06:47:20,212 Starting new HTTPS connection (1): www.malwaredomainlist.com:443
2020-10-29 06:47:20,944 https://www.malwaredomainlist.com:443 "GET /hostslist/hosts.txt HTTP/1.1" 200 35585
2020-10-29 06:47:21,039 *** Searching valid domains...
2020-10-29 06:47:21,057 *** domains=1106 duplicated=0.0%
2020-10-29 06:47:21,057 Starting new HTTPS connection (1): urlhaus.abuse.ch:443
2020-10-29 06:47:21,340 https://urlhaus.abuse.ch:443 "GET /downloads/hostfile/ HTTP/1.1" 200 19478
2020-10-29 06:47:21,459 *** Searching valid domains...
2020-10-29 06:47:21,479 *** domains=2280 duplicated=0.04%
2020-10-29 06:47:21,485 Starting new HTTPS connection (1): pgl.yoyo.org:443
2020-10-29 06:47:21,731 https://pgl.yoyo.org:443 "GET /adservers/serverlist.php?hostformat=hosts;showintro=0 HTTP/1.1" 200 24153
2020-10-29 06:47:21,743 *** Searching valid domains...
2020-10-29 06:47:21,792 *** domains=3568 duplicated=0.14%
2020-10-29 06:47:21,799 Starting new HTTPS connection (1): someonewhocares.org:443
2020-10-29 06:47:22,849 https://someonewhocares.org:443 "GET /hosts/hosts HTTP/1.1" 200 449957
2020-10-29 06:47:24,029 *** Searching valid domains...
2020-10-29 06:47:24,138 *** domains=14662 duplicated=0.85%
2020-10-29 06:47:24,143 Starting new HTTPS connection (1): raw.githubusercontent.com:443
2020-10-29 06:47:24,378 https://raw.githubusercontent.com:443 "GET /notracking/hosts-blocklists/master/hostnames.txt HTTP/1.1" 200 1468999
2020-10-29 06:47:24,738 *** Searching valid domains...
2020-10-29 06:47:25,124 *** domains=194622 duplicated=50.0%
2020-10-29 06:47:25,128 Starting new HTTPS connection (1): s3.amazonaws.com:443
2020-10-29 06:47:25,824 https://s3.amazonaws.com:443 "GET /lists.disconnect.me/simple_ad.txt HTTP/1.1" 200 43616
2020-10-29 06:47:25,866 *** Searching valid domains...
2020-10-29 06:47:25,873 *** domains=2702 duplicated=0.0%
2020-10-29 06:47:25,873 Starting new HTTPS connection (1): s3.amazonaws.com:443
2020-10-29 06:47:26,383 https://s3.amazonaws.com:443 "GET /lists.disconnect.me/simple_tracking.txt HTTP/1.1" 200 613
2020-10-29 06:47:26,390 *** Searching valid domains...
2020-10-29 06:47:26,390 *** domains=35 duplicated=0.0%
2020-10-29 06:47:26,393 Starting new HTTPS connection (1): raw.githubusercontent.com:443
2020-10-29 06:47:26,573 https://raw.githubusercontent.com:443 "GET /davidonzo/Threat-Intel/master/lists/latestdomains.piHole.txt HTTP/1.1" 200 21830
2020-10-29 06:47:26,575 *** Searching valid domains...
2020-10-29 06:47:26,613 *** domains=2193 duplicated=0.05%
2020-10-29 06:47:26,624 Starting new HTTPS connection (1): raw.githubusercontent.com:443
2020-10-29 06:47:26,839 https://raw.githubusercontent.com:443 "GET /mitchellkrogza/Badd-Boyz-Hosts/master/hosts HTTP/1.1" 200 7888
2020-10-29 06:47:26,850 *** Searching valid domains...
2020-10-29 06:47:26,857 *** domains=834 duplicated=0.48%
2020-10-29 06:47:26,893 blocklist total=152978 duplicated=9.57%
2020-10-29 06:47:26,941 blocklist without domains from whitelist total=152977

关于

AuthorDenis Machard d.machard@gmail.com
PyPIhttps://pypi.org/project/blocklist_aggregator/

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java SUN次要代码309含义   java避免为空元素生成XML自关闭标记,并生成自定义的<XML>开始标记   java使用json和restful将数组数据从本地sqlite数据库插入SQL Server   java Spring Boot 1.5.9字符编码问题   LInkedIn讨论中的java 401错误   位图Java:检查多个位向量/位集是否相交的最快方法?   macos如何让Java应用程序以图标出现在Mac OS X dock中   java如何删除netbeans中的@SuppressWarnings(“未使用的”)?   apachestorm中的java自定义序列化   java可以退出代码还是应该终止main?   递归如何在Java中递归地绘制简单的线条?   unicode在Java中确定特定字体是否可以呈现特定字符   打开并阅读带有Selenium/Katalon(Java)特定标题的电子邮件文本(来自Gmail)