计划、设计和建立列车和测试矩阵

matrix-architect的Python项目详细描述


版权所有©2017。芝加哥大学(“芝加哥”)。保留所有权利。

特此授予使用、复制、修改和分发本软件(包括所有目标代码和源代码)以及任何随附文档(统称为“程序”)用于教育和非营利研究目的的许可,无需付费,且无需签署许可协议,前提是上述版权通知,本段和以下三段将出现在所有副本、修改和分发中。为免生疑问,出于教育和非营利研究目的,不包括使用本计划的任何服务或销售服务的一部分。要获得该项目的商业许可证,请联系芝加哥大学波尔斯基创业与创新中心技术商业化和许可,地址:伊利诺伊州芝加哥市东53街1452号2楼,邮编:60615。

由芝加哥大学数据科学与公共政策部创建

这个节目由芝加哥版权所有。该计划是“按原样”提供的,没有芝加哥的任何伴随服务。芝加哥不保证程序的运行是不间断的或无错误的。最终用户理解,该计划是为研究目的而制定的,因此建议不要以任何理由完全依赖该计划。

在任何情况下,芝加哥都不应对任何一方承担直接、间接、特别、偶发或后果性损害,包括因使用该程序而造成的损失,即使芝加哥已被告知有可能发生此类损害。芝加哥特别否认任何保证,包括但不限于,适销性和适合特定目的的默示保证。以下提供的程序按“原样”提供。芝加哥没有义务提供维护、支持、更新、增强或修改。

描述:建筑师

Plan, design, and build train and test matrices

[![Build Status](https://travis-ci.org/dssg/architect.svg?branch=master)](https://travis-ci.org/dssg/architect) [![codecov](https://codecov.io/gh/dssg/architect/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/architect) [![codeclimate](https://codeclimate.com/github/dssg/architect.png)](https://codeclimate.com/github/dssg/architect)

In order to run classification algorithms on source data, this data must be properly organized into design matrices. Converting cleaned data into these matrices is not a trivial task; the process of creating the needed features and labels for an experiment from source data can be complicated, creating the matrices themselves out of features and labels can be inefficient, and there is opportunity at each step to leak data backwards in time to give model trained on a matrix an unfair advantage.

The Architect addresses these issues with functionality aimed at all tasks between cleaned source data (in a PostgreSQL database) and design matrices.

## Components

  • [LabelGenerator](architect/label_generators.py): Create binary labels suitable for a design matrix by querying a database table containing outcome events.
  • [FeatureGenerator](architect/feature_generators.py): Create aggregate features suitable for a design matrix from a set of database tables containing events. Uses [collate](https://github.com/dssg/collate/) to build aggregation SQL queries.
  • [FeatureGroupCreator](architect/feature_group_creator.py), [FeatureGroupMixer](architect/feature_group_mixer.py): Create groupings of features, and mix them using different strategies (like ‘leave one out’) to test their effectiveness.
  • [Planner](architect/planner.py), [Builder](architect/builders.py): Build all design matrices needed for an experiment, taking into account different labels, state configurations, and feature groups.

In addition to being usable individually to assist in different aspects of building matrices in your project, the Architect components are integrated in [triage](https://github.com/dssg/triage) as a part of an entire modeling experiment that incorporates later tasks like model training and testing.

## Distributing, Building & Testing

The Architect is a Python package distributable via setuptools. It may be installed directly using easy_install or pip, or listed as a dependency of another package (namely triage), under the package name matrix-architect.

To build this package for development, its dependencies may be installed using pip:

pip install -r requirements_dev.txt

(或者,在没有测试和开发依赖关系的情况下,使用requirements.txt)。

并且,为开发而构建,运行测试:

pytest

平台:未知 分类器:开发状态::2-pre-alpha 分类器:目标受众::开发人员 分类器:自然语言:英语 分类器:编程语言::python::3.4

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
空间计数器在Java中不起作用   json在java中表示XPath列表的最佳方式   java报警服务接收器安卓   java注入bean在自定义的all存储库中   java从迁移到Spring MVC 4+Hibernate5   JavaEclipseIDE透视图被缓存,更改没有任何影响   java Hibernate:在将对象插入Derby嵌入式数据库时引发SQLGrammerException   适用于mp4或mp3文件的java Exoplayer自动流媒体质量   安卓如何在java的静态方法中使用这个关键字?   SSL服务器端的spring主机名验证+Tomcat和Java 8   java Eclipse强制刷新IDs   java有可能返回Mono。只是从GetMapping(“/”)处理程序中获取(“索引”)吗?   arraylist当我用java编程时,我遇到了这个错误,有人能告诉我到底出了什么问题吗?   java如何更改Apache CXF web服务中的日期时间格式   Jfoenix ChipView中的java多线程   java任务在Spark上不可序列化