Python essay-scorer包_程序模块 - PyPI

英语学习者论文的自动评分器。

essay-scorer的Python项目详细描述

论文评分器

英语学习者论文的自动评分器。

说明

提取一组语言特征，并将其与一个在3000多篇文章上训练的模型进行比较，以40分的比例预测分数。

安装

pip install essay-scorer

用法/教程

命令行用法

接受.txt文件的目录或单个.txt文件。

（要使用这样的命令行，必须找到pip保存脚本的bin。）

用于文本文件目录python3 essay_scorer.py path/to/essays/

奖金python3 essay_scorer.py path/to/essays/ >> output.csv

用于单个文本文件python3 essay_scorer.py path/to/essays/test.txt

在python脚本中导入

import essay_scorer

text = open('test.txt', 'r').read()
feat_set = essay_scorer.get_feats(text)
pred_score = essay_scorer.gbr_model.predict(feats)[0]
print('predicted score', pred_score)

关于

这个自动评分系统是基于Travis Moore's master's thesis work的。

摩尔的硕士论文，为这个项目奠定了理论基础，可以找到here。

这种模式最适用于140-300字的英语学习者作文。由于模型中的分数分布，该模型倾向于在数据集的实际分数中位数（似乎在20左右）附近做出更好的预测。离群分数的预测存在更多的差异。

模型本身是一个GradientBoostingRegressor模型。以下是它根据自己的数据进行测试的参数和结果：

`model.fit` results:
GradientBoostingRegressor(alpha=0.9, criterion='friedman_mse', init=None,
             learning_rate=0.02, loss='ls', max_depth=4, max_features=0.3,
             max_leaf_nodes=None, min_impurity_decrease=0.0,
             min_impurity_split=None, min_samples_leaf=9,
             min_samples_split=2, min_weight_fraction_leaf=0.0,
             n_estimators=500, n_iter_no_change=None, presort='auto',
             random_state=0, subsample=1.0, tol=0.0001,
             validation_fraction=0.1, verbose=0, warm_start=False)
Mean Absolute Error:
Train error:	2.360482423992901
Test error:	2.32169958721344

r2 scores of both train/test:
r2_train:	0.8341340712062068
r2_test:	0.8575492864872207

许可证

GNU GPLV3-有关详细信息，请参见许可文件。2

联系人

@mkylemartin在twitter上，github

注

此版本中的pickled数据文件不包括age_括号或语言_id。提取的49个功能如下（按字母顺序）。

['ari', # readability index
 'avg_len_word',  	# average word length
 'cli', # readability measure
 'conjunctions', 
 'cttr', # corrected type to token ratio
 'dcrs', # dale chall readability score
 'determiners', 
 'dw', # difficult words
 'english_usage', # number of english words used
 'fkg', # flesch_kincaid_grade
 'fre',  # flesch_reading_ease
 'function_ttr', 
 'gf', # gunning_fog
 'grammar_chk', # checks for 2000+ grammar errors
 'lwf', # linsear_write_formula
 'n_bigram_lemma_types', 
 'n_bigram_lemmas', 
 'n_trigram_lemma_types',
 'n_trigram_lemmas', 
 'ncontent_words', 
 'nfunction_words', 
 'nlemma_types',
 'nlemmas', 
 'noun_ttr', 
 'num_tokens', 
 'num_types', 
 'pct_rel_trigrams',
 'pct_transitions', 
 'rank_avg', 
 'rank_total', 
 's1',  # negation stages (the next several features)
 's1a', 
 's1b', 
 's1c',
 's2', 
 's2a', 
 's2b', 
 's2c', 
 's3', 
 's3a', 
 's3b', 
 's3c', 
 's4', 
 's4a',
 's4b', 
 's4c', 
 'sent_density', # average words per sentence
 'spelling_perc',  # what percentage of words spelled correctly
 'ttr' # type token ratio ]

欢迎加入QQ群-->： 979659372

essay-scorer 1.0

essay-scorer的Python项目详细描述

论文评分器

说明

安装

用法/教程

命令行用法

在python脚本中导入

关于

许可证

联系人

注

推荐PyPI第三方库

textclf

udc-distr-jvb

tiegviet

salmon-lib

ioc-parser-ng

qqq

lipn-ml

bluematador

pycourselet

robin-pl

aido-analyze-daff

some-test-distributions

tidal-parser

neuralpredictors

djangosniplates

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

essay-scorer 1.0

essay-scorer的Python项目详细描述

论文评分器

说明

安装

用法/教程

命令行用法

在python脚本中导入

关于

许可证

联系人

注

推荐PyPI第三方库

textclf

udc-distr-jvb

tiegviet

salmon-lib

ioc-parser-ng

qqq

lipn-ml

bluematador

pycourselet

robin-pl

aido-analyze-daff

some-test-distributions

tidal-parser

neuralpredictors

djangosniplates

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签