与Spacy DependencyTreeMatcher一起使用的逆向工程模式

spacy-pattern-builder的Python项目详细描述


Spacy模式生成器

使用培训示例来构建和优化与Spacy的DependencyMatcher一起使用的模式。

动机

从训练数据以编程方式生成模式比手动创建模式更有效。

安装

使用pip:

pip install spacy-pattern-builder

用法

# Import a SpaCy model, parse a string to create a Doc objectimporten_core_web_smtext='We introduce efficient methods for fitting Boolean models to molecular data.'nlp=en_core_web_sm.load()doc=nlp(text)fromspacy_pattern_builderimportbuild_dependency_pattern# Provide a list of tokens we want to match.match_tokens=[doc[i]foriin[0,1,3]]# [We, introduce, methods]''' Note that these tokens must be fully connected. That is,all tokens must have a path to all other tokens in the list,without needing to traverse tokens outside of the list.Otherwise, spacy-pattern-builder will raise a TokensNotFullyConnectedError.You can get a connected set that includes your tokens with the following: '''fromspacy_pattern_builderimportutilconnected_tokens=util.smallest_connected_subgraph(match_tokens,doc)assertmatch_tokens==connected_tokens# In this case, the tokens we provided are already fully connected# Specify the token attributes / features to usefeature_dict={# This is equal to the default feature_dict'DEP':'dep_','TAG':'tag_'}# Build the patternpattern=build_dependency_pattern(doc,match_tokens,feature_dict=feature_dict)frompprintimportpprintpprint(pattern)# In the format consumed by SpaCy's DependencyMatcher:'''[{'PATTERN': {'DEP': 'ROOT', 'TAG': 'VBP'}, 'SPEC': {'NODE_NAME': 'node1'}}, {'PATTERN': {'DEP': 'nsubj', 'TAG': 'PRP'},  'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node0'}}, {'PATTERN': {'DEP': 'dobj', 'TAG': 'NNS'},  'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node3'}}]'''# Create a matcher and add the newly generated patternfromspacy.matcherimportDependencyMatchermatcher=DependencyTreeMatcher(doc.vocab)matcher.add('pattern',None,pattern)# And get matchesmatches=matcher(doc)formatch_id,token_idxsinmatches:tokens=[doc[i]foriintoken_idxs]tokens=sorted(tokens,key=lambdaw:w.i)# Make sure tokens are in their original orderprint(tokens)# [We, introduce, methods]

致谢

用途:

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java在ArrayList中比较数字   java在Kotlin中使异步调用同步   让“Scala编程”junit示例在IntelliJ中工作的java问题   java Servlet侦听器未在ContextListener中设置属性   将Microsoft SQL Server数据库连接到我的Java项目   加载资源时出现java“需要注册工厂”异常   java如何使用POI检查excel中的重复记录?   java如何更改机器生成的代码   java如何确保重写的方法是同步的   用Spring编写Hibernate时的java XML奥秘   java管理mysql数据库中存储的用户权限   java如何运行。来自Javascript的jar方法   java我想在Web应用程序中进行身份验证&对桌面应用程序使用相同的凭据。我该怎么做?