Python PyDomainExtractor包_程序模块 - PyPI

用C++编写的高度优化的域名提取库

PyDomainExtractor的Python项目详细描述

用C++编写的高度优化的域名提取库

license Python Build

关于项目

PyDomainExtractor是一个用于快速将域名解析为其组成部分的库。该库是用C++编写的，实现了最高性能。在

使用

构建

性能

从域提取

测试是在一个包含1000万个来自不同TLD的随机域的文件上进行的（2020年9月24日）

Library	Function	Time
PyDomainExtractor	pydomainextractor.extract	2.30s
publicsuffix2	publicsuffix2.get_sld	25.77s
tldextract	__call__	34.22s
tld	tld.parse_tld	36.64s

从URL提取

测试是在一个包含100万个随机URL的文件上进行的（2020年9月24日）

^{tb2}$

先决条件

为了编译这个包，应该安装GCC、libidn2和Python开发包。在

软呢帽

sudo dnf install python3-devel libidn2-devel gcc-c++

Ubuntu 18.04版

^{pr2}$

安装

pip3 install PyDomainExtractor

使用

提取

importpydomainextractor# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.domain_extractor=pydomainextractor.DomainExtractor()domain_extractor.extract('google.com')>>>{>>>'subdomain':'',>>>'domain':'google',>>>'suffix':'com'>>>}# Loads a custom SuffixList data. Should follow PublicSuffixList's format.domain_extractor=pydomainextractor.DomainExtractor('tld\n''custom.tld\n')domain_extractor.extract('google.com')>>>{>>>'subdomain':'google',>>>'domain':'com',>>>'suffix':''>>>}domain_extractor.extract('google.custom.tld')>>>{>>>'subdomain':'',>>>'domain':'google',>>>'suffix':'custom.tld'>>>}

提取

importpydomainextractor# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.domain_extractor=pydomainextractor.DomainExtractor()domain_extractor.extract('http://google.com/')>>>{>>>'subdomain':'',>>>'domain':'google',>>>'suffix':'com'>>>}

验证

importpydomainextractor# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.domain_extractor=pydomainextractor.DomainExtractor()domain_extractor.is_valid_domain('google.com')>>>Truedomain_extractor.is_valid_domain('domain.اتصالات')>>>Truedomain_extractor.is_valid_domain('xn--mgbaakc7dvf.xn--mgbaakc7dvf')>>>Truedomain_extractor.is_valid_domain('domain-.com')>>>Falsedomain_extractor.is_valid_domain('-sub.domain.com')>>>Falsedomain_extractor.is_valid_domain('\xF0\x9F\x98\x81nonalphanum.com')>>>False

TLD列表

importpydomainextractor# Loads the current supplied version of PublicSuffixList from the repository. Does not download any data.domain_extractor=pydomainextractor.DomainExtractor()domain_extractor.get_tld_list()>>>[>>>'bostik',>>>'backyards.banzaicloud.io',>>>'biz.bb',>>>...>>>]

许可证

根据麻省理工学院的许可证分发。有关详细信息，请参见LICENSE。在

联系人

加本大卫-gal@intsights.com

项目链接：https://github.com/Intsights/PyDomainExtractor

欢迎加入QQ群-->： 979659372

PyDomainExtractor 0.8.5

PyDomainExtractor的Python项目详细描述

用C++编写的高度优化的域名提取库

目录

关于项目

使用

性能

从域提取

从URL提取

先决条件

安装

使用

提取

验证

TLD列表

许可证

联系人

推荐PyPI第三方库

django-charting

fh-django-gcs

lazy-streams

turkmarker

wiki2sphinx

django-tinymce-4

pyplanet-currentcps

django-rholang-editor

collective.recipe.realpath

odoo8-addon-purchase-request

rumbleinthejungle

odoo9-addon-mrp-bom-structure-xlsx-level-1

django-beautifulpredicates

django-outbox-middleware

dask-rasterio

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

PyDomainExtractor 0.8.5

PyDomainExtractor的Python项目详细描述

用C++编写的高度优化的域名提取库

目录

关于项目

使用

性能

从域提取

从URL提取

先决条件

安装

使用

提取

验证

TLD列表

许可证

联系人

推荐PyPI第三方库

django-charting

fh-django-gcs

lazy-streams

turkmarker

wiki2sphinx

django-tinymce-4

pyplanet-currentcps

django-rholang-editor

collective.recipe.realpath

odoo8-addon-purchase-request

rumbleinthejungle

odoo9-addon-mrp-bom-structure-xlsx-level-1

django-beautifulpredicates

django-outbox-middleware

dask-rasterio

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签