Python langdetect包_程序模块 - PyPI

语言检测库移植自谷歌的语言检测。

langdetect的Python项目详细描述

语言检测

google的[语言检测]（https://code.google.com/p/language-detection/）库（2014年3月3日版本）到python的端口。

安装

$ pip install langdetect

支持的Python版本2.6、2.7、3.x。

语言

langdetect支持55种现成的语言（[iso 639-1代码]（https://en.wikipedia.org/wiki/List_of_ISO_639-1_codes））：

af, ar, bg, bn, ca, cs, cy, da, de, el, en, es, et, fa, fi, fr, gu, he, hi, hr, hu, id, it, ja, kn, ko, lt, lv, mk, ml, mr, ne, nl, no, pa, pl, pt, ro, ru, sk, sl, so, sq, sv, sw, ta, te, th, tl, tr, uk, ur, vi, zh-cn, zh-tw

基本用法

要检测文本的语言：

`python >>> from langdetect import detect >>> detect("War doesn't show who's right, just who's left.") 'en' >>> detect("Ein, zwei, drei, vier") 'de' `

要找出顶级语言的概率：

`python >>> from langdetect import detect_langs >>> detect_langs("Otec matka syn.") [sk:0.572770823327, pl:0.292872522702, cs:0.134356653968] `

注释

语言检测算法是不确定的，这意味着如果你试图在一个文本上运行它，要么太短，要么太模糊，你可能会得到不同的结果每次你运行它。

要强制执行一致的结果，请在第一种语言检测之前调用以下代码：

`python from langdetect import DetectorFactory DetectorFactory.seed = 0 `

如何添加新语言？

您需要创建一个新的语言配置文件。最简单的方法是使用[langdetect.jar]（https://github.com/shuyo/language-detection/raw/master/lib/langdetect.jar）工具，它可以从维基百科的抽象数据库文件或纯文本生成语言配置文件。

维基百科摘要数据库文件可以从“维基百科下载”（[http://download.wikimedia.org/](http://download.wikimedia.org/）中检索。它们形成了“（语言代码）wiki-（version）-abstract.xml”（例如，“enwiki-20101004-abstract.xml”）。

用法：java -jar langdetect.jar --genprofile-d [directory path] [language codes]

按-d选项指定包含抽象数据库的目录。
这个工具可以处理gzip压缩文件。

备注：中文数据库文件名类似“zhwiki-（version）-abstract zh cn.xml”或“zhwiki-（version）-abstract zh tw.xml”，因此必须修改“zh cnwiki-（version）-abstract.xml”或“zh twwiki-（version）-abstract.xml”。

要从纯文本生成语言配置文件，请使用genprofile text命令。

用法：java -jar langdetect.jar --genprofile-text-l [language code] [text file path]

有关详细信息，请参见[语言检测wiki]（https://code.google.com/archive/p/language-detection/wikis/Tools.wiki）。

原始项目

这个库是google的[语言检测]（https://code.google.com/p/language-detection/）库从java到python的直接端口。所有的类和方法都是不变的，因此有关更多信息，请参见项目的网站或wiki。

语言检测算法介绍：[http://www.slideshare.net/shuyo/language-detection-library-for-java](http://www.slideshare.net/shuyo/language-detection-library-for-java）。

欢迎加入QQ群-->： 979659372

langdetect 1.0.7

langdetect的Python项目详细描述

语言检测

安装

语言

基本用法

如何添加新语言？

原始项目

推荐PyPI第三方库

aliyun-python-sdk-highddos

healthkit-to-sqlite

tabtools

pyDownload

pubpub

fileDownloader.p

django-modeldict-rc

ukt

snapsat-worker

arpes

ConfigEnv

metrics-manager

google-cloud-container

eafsotlogger

compoze

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

langdetect 1.0.7

langdetect的Python项目详细描述

语言检测

安装

语言

基本用法

如何添加新语言？

原始项目

推荐PyPI第三方库

aliyun-python-sdk-highddos

healthkit-to-sqlite

tabtools

pyDownload

pubpub

fileDownloader.p

django-modeldict-rc

ukt

snapsat-worker

arpes

ConfigEnv

metrics-manager

google-cloud-container

eafsotlogger

compoze

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签