Python stanford-corenlp包_程序模块 - PyPI

官方蟒蛇接口

stanford-corenlp的Python项目详细描述

https://travis-ci.org/stanfordnlp/python-stanford-corenlp.svg?branch=master

这个包包含一个用于Stanford CoreNLP的python接口，该接口包含一个引用与Stanford CoreNLP server接口的实现。该包还包含一个基类，用于公开基于python的注释 corenlp的提供者（例如，你最喜欢的神经内窥器系统）通过轻量级服务进行管道传输。

要使用这个包，首先下载official java CoreNLP release，解压缩它，然后定义一个环境指向解压缩目录的变量$CORENLP_HOME。

您还可以使用pip install stanford-corenlp

从PyPI安装此软件包。

命令行用法

使用这个包最简单的方法可能是通过注释命令行实用程序：

usage: annotate [-h] [-i INPUT] [-o OUTPUT] [-f {json}]
                [-a ANNOTATORS [ANNOTATORS ...]] [-s] [-v] [-m MEMORY]
                [-p PROPS [PROPS ...]]

Annotate data

optional arguments:
  -h, --help            show this help message and exit
  -i INPUT, --input INPUT
                        Input file to process; each line contains one document
                        (default: stdin)
  -o OUTPUT, --output OUTPUT
                        File to write annotations to (default: stdout)
  -f {json}, --format {json}
                        Output format
  -a ANNOTATORS [ANNOTATORS ...], --annotators ANNOTATORS [ANNOTATORS ...]
                        A list of annotators
  -s, --sentence-mode   Assume each line of input is a sentence.
  -v, --verbose-server  Server is made verbose
  -m MEMORY, --memory MEMORY
                        Memory to use for the server
  -p PROPS [PROPS ...], --props PROPS [PROPS ...]
                        Properties as a list of key=value pairs

我们建议结合使用注释和美妙的jq 处理输出的命令。例如，给定一个文件在每行中，下面的命令生成一个等价的空格分隔标记：

cat file.txt | annotate -s -a tokenize | jq '[.tokens[].originalText]' > tokenized.txt

注释服务器用法

importcorenlptext="Chris wrote a simple sentence that he parsed with Stanford CoreNLP."# We assume that you've downloaded Stanford CoreNLP and defined an environment# variable $CORENLP_HOME that points to the unzipped directory.# The code below will launch StanfordCoreNLPServer in the background# and communicate with the server to annotate the sentence.withcorenlp.CoreNLPClient(annotators="tokenize ssplit pos lemma ner depparse".split())asclient:ann=client.annotate(text)# You can access annotations using ann.sentence=ann.sentence[0]# The corenlp.to_text function is a helper function that# reconstructs a sentence from tokens.assertcorenlp.to_text(sentence)==text# You can access any property within a sentence.print(sentence.text)# Likewise for tokenstoken=sentence.token[0]print(token.lemma)# Use tokensregex patterns to find who wrote a sentence.pattern='([ner: PERSON]+) /wrote/ /an?/ []{0,3} /sentence|article/'matches=client.tokensregex(text,pattern)# sentences contains a list with matches for each sentence.assertlen(matches["sentences"])==1# length tells you whether or not there are any matches in thisassertmatches["sentences"][0]["length"]==1# You can access matches like most regex groups.matches["sentences"][1]["0"]["text"]=="Chris wrote a simple sentence"matches["sentences"][1]["0"]["1"]["text"]=="Chris"# Use semgrex patterns to directly find who wrote what.pattern='{word:wrote} >nsubj {}=subject >dobj {}=object'matches=client.semgrex(text,pattern)# sentences contains a list with matches for each sentence.assertlen(matches["sentences"])==1# length tells you whether or not there are any matches in thisassertmatches["sentences"][0]["length"]==1# You can access matches like most regex groups.matches["sentences"][1]["0"]["text"]=="wrote"matches["sentences"][1]["0"]["$subject"]["text"]=="Chris"matches["sentences"][1]["0"]["$object"]["text"]=="sentence"

有关更多示例，请参见test_client.py和test_protobuf.py。道具 @Dan Zheng获得TokensRegex/Semgrex支持。

注释服务用法

note：注释服务允许用户提供自定义的 corenlp管道要使用的注释器。不幸的是，它依赖斯坦福大学corenlp项目内部的实验代码还没有可供公众使用。

importcorenlpfrom.happyfuntokenizerimportTokenizerclassHappyFunTokenizer(Tokenizer,corenlp.Annotator):def__init__(self,preserve_case=False):Tokenizer.__init__(self,preserve_case)corenlp.Annotator.__init__(self)@propertydefname(self):"""
        Name of the annotator (used by CoreNLP)
        """return"happyfun"@propertydefrequires(self):"""
        Requires has to specify all the annotations required before we
        are called.
        """return[]@propertydefprovides(self):"""
        The set of annotations guaranteed to be provided when we are done.
        NOTE: that these annotations are either fully qualified Java
        class names or refer to nested classes of
        edu.stanford.nlp.ling.CoreAnnotations (as is the case below).
        """return["TextAnnotation","TokensAnnotation","TokenBeginAnnotation","TokenEndAnnotation","CharacterOffsetBeginAnnotation","CharacterOffsetEndAnnotation",]defannotate(self,ann):"""
        @ann: is a protobuf annotation object.
        Actually populate @ann with tokens.
        """buf,beg_idx,end_idx=ann.text.lower(),0,0fori,wordinenumerate(self.tokenize(ann.text)):token=ann.sentencelessToken.add()# These are the bare minimum required for the TokenAnnotationtoken.word=wordtoken.tokenBeginIndex=itoken.tokenEndIndex=i+1# Seek into the txt until you can find this word.try:# Try to update beginning indexbeg_idx=buf.index(word,beg_idx)exceptValueError:# Give up -- this will be something randomend_idx=beg_idx+len(word)token.beginChar=beg_idxtoken.endChar=end_idxbeg_idx,end_idx=end_idx,end_idxannotator=HappyFunTokenizer()# Calling .start() will launch the annotator as a service running on# port 8432 by default.annotator.start()# annotator.properties contains all the right properties for# Stanford CoreNLP to use this annotator.withcorenlp.CoreNLPClient(properties=annotator.properties,annotators="happyfun ssplit pos".split())asclient:ann=client.annotate("RT @ #happyfuncoding: this is a typical Twitter tweet :-)")tokens=[t.wordfortinann.sentence[0].token]print(tokens)

有关更多示例，请参见test_annotator.py。

欢迎加入QQ群-->： 979659372

stanford-corenlp 3.9.2

stanford-corenlp的Python项目详细描述

命令行用法

注释服务器用法

注释服务用法

推荐PyPI第三方库

torchtrainer

grokcore.annotation

pyres-scheduler

download-jenkins-build-log

rainbow-django

Fabric-with-working-dependencies

django-jigsawview

pyfancyplots

kayako

onetable

greece

gces-subsfm

mediawiki-dump

django-spillwa

odoo8-addon-purchase-order-type

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

stanford-corenlp 3.9.2

stanford-corenlp的Python项目详细描述

命令行用法

注释服务器用法

注释服务用法

推荐PyPI第三方库

torchtrainer

grokcore.annotation

pyres-scheduler

download-jenkins-build-log

rainbow-django

Fabric-with-working-dependencies

django-jigsawview

pyfancyplots

kayako

onetable

greece

gces-subsfm

mediawiki-dump

django-spillwa

odoo8-addon-purchase-order-type

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签