Python dframc包_程序模块 - PyPI

用于spaCy的熊猫数据帧集成

dframc的Python项目详细描述

德拉姆西

DframCy是一个轻量级的实用模块，它将Pandas数据帧集成到spaCy的语言注释和训练任务中。DframCy提供干净的api来将spaCy的语言注释、Matcher和PhraseMatcher信息转换为Pandas dataframe，还支持从CSV/XLXS/XLS对NLP管道进行培训和评估，而不需要对spaCy的底层api进行任何更改。在

入门

DframCy易于安装。只需要以下几点：

要求

Python 3.5或更高版本
熊猫
间距>=2.2.0

还需要下载spaCy的语言模型：

python -m spacy download en_core_web_sm

有关详细信息，请参阅：Models & Languages

安装：

可以从PyPi安装此包，方法是运行：

^{pr2}$

从源代码构建：

git clone https://github.com/yash1994/dframcy.git
cd dframcy
python setup.py install

使用

语言注释

获取数据帧中的语言注释。有关语言注释（dataframe列名），请参阅spaCy's Token API文档。在

importspacyfromdframcyimportDframCynlp=spacy.load("en_core_web_sm")dframcy=DframCy(nlp)doc=dframcy.nlp(u"Apple is looking at buying U.K. startup for $1 billion")# default columns: ["id", "text", "start", "end", "pos_", "tag_", "dep_", "head", "ent_type_"]annotation_dataframe=dframcy.to_dataframe(doc)# can also pass columns names (spaCy's linguistic annotation attributes)annotation_dataframe=dframcy.to_dataframe(doc,columns=["text","lemma_","lower_","is_punct"])# for separate entity dataframetoken_annotation_dataframe,entity_dataframe=dframcy.to_dataframe(doc,separate_entity_dframe=True)# custom attributes can also be includedfromspacy.tokensimportTokenfruit_getter=lambdatoken:token.textin("apple","pear","banana")Token.set_extension("is_fruit",getter=fruit_getter)doc=dframcy.nlp(u"I have an apple")annotation_dataframe=dframcy.to_dataframe(doc,custom_attributes=["is_fruit"])

基于规则的匹配

# Token-based Matchingimportspacynlp=spacy.load("en_core_web_sm")fromdframcy.matcherimportDframCyMatcher,DframCyPhraseMatcherdframcy_matcher=DframCyMatcher(nlp)pattern=[{"LOWER":"hello"},{"IS_PUNCT":True},{"LOWER":"world"}]dframcy_matcher.add("HelloWorld",None,pattern)doc=dframcy_matcher.nlp("Hello, world! Hello world!")matches_dataframe=dframcy_matcher(doc)# Phrase Matchingdframcy_phrase_matcher=DframCyPhraseMatcher(nlp)terms=[u"Barack Obama",u"Angela Merkel",u"Washington, D.C."]patterns=[dframcy_phrase_matcher.get_nlp().make_doc(text)fortextinterms]dframcy_phrase_matcher.add("TerminologyList",None,*patterns)doc=dframcy_phrase_matcher.nlp(u"German Chancellor Angela Merkel and US President Barack Obama "u"converse in the Oval Office inside the White House in Washington, D.C.")phrase_matches_dataframe=dframcy_phrase_matcher(doc)

命令行界面

Dframcy支持命令行参数，用于将纯文本文件转换为CSV/JSON格式的语言注释文本，从CSV/XLS格式的训练数据中训练和评估语言模型。 Training data example。训练和评估的CLI参数与spaCy's CLI完全相同，唯一的区别是训练数据的格式。在

# convert
dframcy convert -i plain_text.txt -o annotations.csv -t csv

# train
dframcy train -l en -o spacy_models -t train.csv -d test.csv

# evaluate
dframcy evaluate -m spacy_model/ -d test.csv

# train text classifier
dframcy textcat -o spacy_model/ -t data/textcat_training.csv -d data/textcat_training.csv

欢迎加入QQ群-->： 979659372

dframcy 0.1.5

dframc的Python项目详细描述

德拉姆西

入门

要求

安装：

使用

语言注释

基于规则的匹配

命令行界面

推荐PyPI第三方库

pypaws

gnocchiclient

gimei

collective.js.galleryview

sbt-python-client

jsonrpclient

canfork

aliyun-python-sdk-pvtz

hmac

slack-rtm-bot

pyirrlicht

django-admin-auto-tests

popt

gzint

margheqiita

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

dframcy 0.1.5

dframc的Python项目详细描述

德拉姆西

入门

要求

安装：

使用

语言注释

基于规则的匹配

命令行界面

推荐PyPI第三方库

pypaws

gnocchiclient

gimei

collective.js.galleryview

sbt-python-client

jsonrpclient

canfork

aliyun-python-sdk-pvtz

hmac

slack-rtm-bot

pyirrlicht

django-admin-auto-tests

popt

gzint

margheqiita

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签