与Spacy DependencyTreeMatcher一起使用的逆向工程模式
spacy-pattern-builder的Python项目详细描述
Spacy模式生成器
使用培训示例来构建和优化与Spacy的DependencyMatcher一起使用的模式。
动机
从训练数据以编程方式生成模式比手动创建模式更有效。
安装
使用pip:
pip install spacy-pattern-builder
用法
# Import a SpaCy model, parse a string to create a Doc objectimporten_core_web_smtext='We introduce efficient methods for fitting Boolean models to molecular data.'nlp=en_core_web_sm.load()doc=nlp(text)fromspacy_pattern_builderimportbuild_dependency_pattern# Provide a list of tokens we want to match.match_tokens=[doc[i]foriin[0,1,3]]# [We, introduce, methods]''' Note that these tokens must be fully connected. That is,all tokens must have a path to all other tokens in the list,without needing to traverse tokens outside of the list.Otherwise, spacy-pattern-builder will raise a TokensNotFullyConnectedError.You can get a connected set that includes your tokens with the following: '''fromspacy_pattern_builderimportutilconnected_tokens=util.smallest_connected_subgraph(match_tokens,doc)assertmatch_tokens==connected_tokens# In this case, the tokens we provided are already fully connected# Specify the token attributes / features to usefeature_dict={# This is equal to the default feature_dict'DEP':'dep_','TAG':'tag_'}# Build the patternpattern=build_dependency_pattern(doc,match_tokens,feature_dict=feature_dict)frompprintimportpprintpprint(pattern)# In the format consumed by SpaCy's DependencyMatcher:'''[{'PATTERN': {'DEP': 'ROOT', 'TAG': 'VBP'}, 'SPEC': {'NODE_NAME': 'node1'}}, {'PATTERN': {'DEP': 'nsubj', 'TAG': 'PRP'}, 'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node0'}}, {'PATTERN': {'DEP': 'dobj', 'TAG': 'NNS'}, 'SPEC': {'NBOR_NAME': 'node1', 'NBOR_RELOP': '>', 'NODE_NAME': 'node3'}}]'''# Create a matcher and add the newly generated patternfromspacy.matcherimportDependencyMatchermatcher=DependencyTreeMatcher(doc.vocab)matcher.add('pattern',None,pattern)# And get matchesmatches=matcher(doc)formatch_id,token_idxsinmatches:tokens=[doc[i]foriintoken_idxs]tokens=sorted(tokens,key=lambdaw:w.i)# Make sure tokens are in their original orderprint(tokens)# [We, introduce, methods]
致谢
用途: