Python locationtagger包_程序模块 - PyPI

从文本或URL中检测和提取位置并查找位置之间的关系

locationtagger的Python项目详细描述

定位标记器

{str.0.1版

从文本或URL中检测并提取位置（国家、地区/州和城市）。同时，找出国家、地区和城市之间的关系。在

关于项目

在Natural Lauguage Processing领域，针对文本数据的不同类型的句法和语义分析，已经提出了许多算法。NER（Named Entity Recognition）是实际文本挖掘问题中最经常需要的任务之一，它遵循一些基于语法的规则和统计建模方法。从NER中提取的实体可以是个人、地点、组织或产品的名称。locationtagger是一个进一步的过程，用于标记并过滤掉使用NER找到的所有实体中的地名（位置）。在

下面的图片给出了所遵循的方法

https://github.com/kaushiksoni10/locationtagger/blob/master/locationtagger/data/diagram.jpg?raw=true Approach

安装和设置

{str}=python环境

使用pip-

pip install locationtagger

但是在安装软件包之前，我们需要安装下面给出的一些有用的库

nltk

spacy

newspaper3k

pycountry

安装这些包之后，需要使用ipythonshell或Jupyter notebook上的/locationtagger/bin/locationtagger-nltk-spacy中给出的命令来下载一些重要的nltk&spacy模块。在

使用

在正确安装软件包后，导入模块并给出一些文本/URL作为输入

文本作为输入

importlocationtaggertext="Unlike India and Japan, A winter weather advisory remains in effect through 5 PM along and east of a line from Blue Earth, to Red Wing line in Minnesota and continuing to along an Ellsworth, to Menomonie, and Chippewa Falls line in Wisconsin."entities=locationtagger.find_locations(text=text)

现在我们可以抓取上面文字中出现的所有地名

^{pr2}$

['India', 'Japan']

entities.regions

['Minnesota', 'Wisconsin']

entities.cities

['Ellsworth', 'Red Wing', 'Blue Earth', 'Chippewa Falls', 'Menomonie']

除了以上摘自正文的地方，我们还可以找到这些摘录的cities，regions所属的国家

entities.country_regions

{'United States': ['Minnesota', 'Wisconsin']}

entities.country_cities

{'United States': ['Ellsworth', 'Red Wing', 'Blue Earth', 'Chippewa Falls', 'Menomonie']}

由于“美国”是一个国家，但在文本中没有出现，所以仍然是从与文本中的cities&；regions的关系而来的，我们可以在other_countries中找到它

entities.other_countries

['United States']

如果我们真的认真对待文本中的cities，我们可以找到它可能属于世界的哪些地区

entities.region_cities

{'Maine': ['Ellsworth'], 'Minnesota': ['Red Wing', 'Blue Earth'], 'Wisconsin': ['Ellsworth', 'Chippewa Falls', 'Menomonie'], 'Pennsylvania': ['Ellsworth'], 'Michigan': ['Ellsworth'], 'Illinois': ['Ellsworth'], 'Kansas': ['Ellsworth'], 'Iowa': ['Ellsworth']}

显然，我们将把这些区域放在other_regions中，因为它们在原文中不存在

entities.other_regions

['Maine', 'Minnesota', 'Wisconsin', 'Pennsylvania', 'Michigan', 'Illinois', 'Kansas', 'Iowa']

无论nltk&spacy从原始文本中抓取的是named entity，它们中的大多数都存储在cities，regions&；countries。但是剩下的单词（不被识别为地名）将存储在other中。在

entities.other

['winter', 'PM', 'Chippewa']

URL作为输入

类似的，它也可以从网址抓取位置

URL='https://edition.cnn.com/2020/01/14/americas/staggering-number-of-human-rights-defenders-killed-in-colombia-the-un-says/index.html'entities2=locationtagger.find_locations(url=URL)

我们得到的输出：国家

entities2.countries

['Switzerland', 'Colombia']

地区

entities2.regions

['Geneva']

城市

entities2.cities

['Geneva', 'Colombia']

现在，如果我们想检查一个地方被提到了多少次，或者在整个URL页面中提到的最常见的地方，我们就可以知道这个页面在谈论什么位置

因此，最常被提及的国家

entities2.country_mentions

[('Colombia', 3), ('Switzerland', 1), ('United States', 1), ('Mexico', 1)]

以及最常提及的城市

entities2.city_mentions

[('Colombia', 3), ('Geneva', 1)]

学分

locationtagger使用来自以下来源的数据进行国家、地区和城市查找

GEOLITE2 free downloadable database

除了著名的nlp库NLTK&；spacy，locationtagger使用以下非常有用的库：

pycountry

newspaper3k

欢迎加入QQ群-->： 979659372

locationtagger 0.0.1

locationtagger的Python项目详细描述

定位标记器

关于项目

安装和设置

使用

文本作为输入

URL作为输入

学分

推荐PyPI第三方库

usernamegen

filer2

django_cbtp_email

hickle

guavahash

hypothesis-ethereum

pysignals

graphjoiner

input_reader

testwizard.mobile

generate

fp

MetaMusic

pyqt-finance

kit

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

locationtagger 0.0.1

locationtagger的Python项目详细描述

定位标记器

关于项目

安装和设置

使用

文本作为输入

URL作为输入

学分

推荐PyPI第三方库

usernamegen

filer2

django_cbtp_email

hickle

guavahash

hypothesis-ethereum

pysignals

graphjoiner

input_reader

testwizard.mobile

generate

fp

MetaMusic

pyqt-finance

kit

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签