Python urltitle包_程序模块 - PyPI

获取URL的页面标题或基于标题的描述

urltitle的Python项目详细描述

URL标题

url title使用python 3.7返回给定url的页面标题或基于页眉的描述。它的主要用途是在对话中包含返回值。作为免责声明，请注意，由于许多可能的因素，不能保证返回的标题是准确的。

功能

内存中的缓存使用默认时间为一周。缓存大小和时间是可自定义的。

大约只有返回一个标题所需的HTML页面的部分被读取，直到可定制的最大值为1 MIB。

一个PDF标题元数据提取器用于PDF文件，最多可定制的最大尺寸为8 MIB。

除了有不可恢复的错误（即400、401、404等）外，最多可尝试三次恢复功能。

对缺少方案的url（例如git-scm.com/downloads）进行https和http的猜测。

可以选择禁用https站点的ssl验证。

如果html页面显示distil captcha，则使用退避到google web缓存。它也用于pdf文件太大或没有标题元数据。

可以在所需级别上为名为urltitle的记录器启用诊断日志记录。

某些特定于站点的自定义设置是可配置的：

多个基于正则表达式的URL和标题替换
使用Google Web缓存
用户代理
附加标题

链接

代码：https://github.com/impredicative/urltitle/
释放：https://pypi.org/project/urltitle/

用法

安装

由于引用，需要python≥3.7 到^{}。

要安装软件包，请运行：

pip install urltitle

示例

fromurltitleimportURLTitleReaderreader=URLTitleReader(verify_ssl=True)# Titles for HTML contentreader.title('https://www.cnn.com/2019/02/11/health/insect-decline-study-intl/index.html')"Insect numbers in precipitous decline could have 'catastrophic' consequences, warns study - CNN"reader.title('https://www.youtube.com/watch?v=53YvP6gdD7U')'Deep Learning State of the Art (2019) - MIT - YouTube'# Titles for URLs with a missing schemereader.title('www.reuters.com/article/us-usa-military-army/army-calls-base-housing-hazards-unconscionable-details-steps-to-protect-families-idUSKCN1Q4275')"Army calls base housing hazards 'unconscionable,' details steps to protect families | Reuters"reader.title('reddit.com/r/FoodNerds/comments/arb6qj')'Paternal high-fat diet transgenerationally impacts hepatic immunometabolism. - PubMed - NCBI : FoodNerds'reader.title('neverssl.com')'NeverSSL - helping you get online'# Titles for non-ASCII URLsreader.title('https://en.wikipedia.org/wiki/Amanattō')'Amanattō - Wikipedia'reader.title('https://fr.wikipedia.org/wiki/Wikipédia:Accueil_principal')"Wikipédia, l'encyclopédie libre"# Titles for PDFs having title metadatareader.title('https://www.diabetes.org.br/publico/images/pdf/artificial-sweeteners-induce-glucose-intolerance-by-altering-the-gut-microbiota.pdf')'Artificial sweeteners induce glucose intolerance by altering the gut microbiota'reader.title('https://www.omicsonline.org/open-access/detection-of-glyphosate-in-malformed-piglets-2161-0525.1000230.pdf')'Detection of Glyphosate in Malformed Piglets'# Titles for other content showing Content-Type and Content-Length as available:reader.title('https://www.sciencedaily.com/images/2019/02/190213142720_1_540x360.jpg')'(image/jpeg) (54K)'reader.title('https://kdnuggets.com/rss')'(application/rss+xml; charset=UTF-8)'reader.title('https://download.fedoraproject.org/pub/fedora/linux/releases/29/Workstation/x86_64/iso/Fedora-Workstation-Live-x86_64-29-1.2.iso')'(application/octet-stream) (2G)'# Titles for substituted URLs as per configuration:reader.title('https://arxiv.org/pdf/1902.04704.pdf')'[1902.04704] Neural network models and deep learning - a primer for biologists'reader.title('https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2495396/pdf/postmedj00315-0056.pdf')"Features of a successful therapeutic fast of 382 days' duration"reader.title('https://pdfs.semanticscholar.org/1d76/d4561b594b5c5b5250edb43122d85db07262.pdf')'Nutrition and health. The issue is not food, nor nutrients, so much as processing. - Semantic Scholar'

异常

应出现错误以引发urltitle.URLTitleError异常。

定制

对于任何特定于站点的定制，使用相关站点使用先前存在的条目作为例子。请参阅^{}。 url的站点由中的URLTitleReader().netloc(url)方法定义并返回 ^{}。

以下示例显示了各种url及其对应的站点，以便输入特定于站点的自定义设置：

URL	Site
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}

欢迎加入QQ群-->： 979659372

urltitle 0.2.25

urltitle的Python项目详细描述

URL标题

功能

链接

用法

安装

示例

异常

定制

推荐PyPI第三方库

django-patron

django-extensions-shell

nyc-crime-map

nengo-extras

pycartool

kotti_rstdocument

eventstream

snmpsim-data

cloudsat-object-manipulation

TEST-TracAccountManager

saucelabsfixture

django-coldbrew

wwwclient

cover-rage-server

appdo

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

URL	Site
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}

URL	Site
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}

urltitle 0.2.25

urltitle的Python项目详细描述

URL标题

功能

链接

用法

安装

示例

异常

定制

推荐PyPI第三方库

django-patron

django-extensions-shell

nyc-crime-map

nengo-extras

pycartool

kotti_rstdocument

eventstream

snmpsim-data

cloudsat-object-manipulation

TEST-TracAccountManager

saucelabsfixture

django-coldbrew

wwwclient

cover-rage-server

appdo

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签

URL	Site
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}
^{}	^{}