文本实体链接器
texta-entity-linker的Python项目详细描述
安装
pip install https://pypi.texta.ee/texta-concatenator/texta-concatenator-latest.tar.gz
说明
textaentitylinker服务器作为一种方法,以Texta事实的形式将多个实体链接在一起,以创建更具体的 以及在处理文本中存在的个人信息的统一轮廓。在
此过程将只处理先前由textamlp处理并包含“BOUNDED”类型事实的文档。在
此外,EntityLinker需要一个缩写.json包含机构简写名和全名的键值对的文件。这个包裹来了 使用默认情况下使用的基文件,但始终可以通过给类指定要使用的文件的文件路径来更改它。在
此包不支持应用textamlp,您需要自己安装该包或将其应用于已处理的文档。
用法
创建一个实例类:
^{pr2}$准备要分析的输入:
字母1:
Dear all,
Let`s not forget that I intend to concure the whole of Persian Empire!
Best wishes,
Alexander Great
aleksandersuur356eKr@mail.ee
phone: 76883266
信函2:
От: Terry Pratchett < tpratchett@gmail.com >
Кому: Joe Abercrombie < jabercrombie@gmail.com >
Название: Разъяснение
Дорогой Joe,
Как вы? Надеюсь, у тебя все хорошо. Последний месяц я писал свой новый роман,
который обещал представить в начале лета. Я тоже немного почитал и обожаю твою
новую книгу!
Я просто хотел уточнить, что Alexander Great жил в Македонии.
Лучший,
Terry
字母3:
Dear Terry!
Terry Pratchett already created Discworld. This name is taken. Other than that I found
the piece fascanating and see great potential in you! I strongly encourage you to take
action in publishing your works. Btw, if you would like to show your works to Pratchett
as well, he`s interested. I talked about you to him. His email is tpratchett@gmail.com.
Feel free to write him!
Joe
From: Terry Berry < bigfan@gmail.com >
To: Joe Abercrombie < jabercrombie@gmail.com >
Title: Question
Hi Joe,
I finally finished my draft and I`m sending it to you. The hardest part
was creating new places. What do you think of the names of the places I created?
Terry Berry
通过Texta MLP包处理输入:
fromtexta_mlp.mlpimportMLP# This folder should contain all the MLP associated models and data.# If they don't exists, it will download them and store it at paths location,# creating directories as needed.# All the inputs must be processed one by one.m=MLP(resource_dir="/home/texta/mlp_data")mlp_analysis=m.process(letter_1)
此过程执行基本实体分析,并创建实体链接过程所需的有界事实:
[
{
'doc_path': 'text.text',
'fact': 'EMAIL',
'lemma': None,
'spans': '[[114, 142]]',
'str_val': 'aleksandersuur356eKr@mail.ee'
},
{
'doc_path': 'text.text',
'fact': 'LOC',
'lemma': None,
'spans': '[[67, 81]]',
'str_val': 'Persian Empire'
},
{
'doc_path': 'text.text',
'fact': 'BOUNDED',
'lemma': "{'PER': ['Alexander Great'], 'EMAIL': "
"['aleksandersuur356ekr@mail.ee'], 'PHONE': ['76883266']}",
'spans': '[[98, 113], [114, 142], [151, 159]]',
'str_val': "{'PER': ['Alexander Great'], 'EMAIL': "
"['aleksandersuur356eKr@mail.ee'], 'PHONE': ['76883266']}"
},
{
'doc_path': 'text.text',
'fact': 'NAMEMAIL',
'lemma': None,
'spans': '[[98, 142]]',
'str_val': 'Alexander Great aleksandersuur356eKr@mail.ee'
},
{
'doc_path': 'text.text',
'fact': 'PHONE',
'lemma': None,
'spans': '[[151, 159]]',
'str_val': '76883266'
}
]
将批处理加载到EntityLinker:
# Note that the full result of the MLP process is necessary, # not only the texta_facts dictionary.c.from_json([mlp_letter_1,mlp_letter_2,mlp_letter_3])
触发实体链接过程:
# On larger datasets, this might take a long time.c.link_entities()
其他信息:
您可以使用以下函数检查数据库列表和内容的长度:
- cn.\u just_pers_infos()(键入“close_persons”,用字母表示接近的人)
- cn.\u bounded()(原始的无约束有界)
- cn.\u unconfirative_infos()(键入“不确定谁的实体”,有两个候选人的企业,不确定它属于谁)
- cn._no_personas_infos()(键入“no_per_close_entities”,实体以字母结尾出现,附近没有人
- cn.\u persona_infos()(键入“person_info”,真正的交易,实体与其个人)。在
输出:
.link_entities()函数完成任务后,可以查看完整的结果 与以下实体链接的实体:
c.to_json()
[
{"type": "person_info", "PER": "Alexander Great", "LOC": ["Македония", "Persian Empire"], "EMAIL": ["aleksandersuur356eKr@mail.ee"], "PHONE": ["76883266"]}
{"type": "person_info", "PER": "Joe Abercrombie", "EMAIL": ["jabercrombie@gmail.com"]}
{"type": "person_info", "PER": "Terry Berry", "EMAIL": ["bigfan@gmail.com"]}
{"type": "person_info", "PER": "Terry Pratchett", "EMAIL": ["tpratchett@gmail.com"]}
]
- 项目
标签: