







Plan, design, and build train and test matrices

[![Build Status](https://travis-ci.org/dssg/architect.svg?branch=master)](https://travis-ci.org/dssg/architect) [![codecov](https://codecov.io/gh/dssg/architect/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/architect) [![codeclimate](https://codeclimate.com/github/dssg/architect.png)](https://codeclimate.com/github/dssg/architect)

In order to run classification algorithms on source data, this data must be properly organized into design matrices. Converting cleaned data into these matrices is not a trivial task; the process of creating the needed features and labels for an experiment from source data can be complicated, creating the matrices themselves out of features and labels can be inefficient, and there is opportunity at each step to leak data backwards in time to give model trained on a matrix an unfair advantage.

The Architect addresses these issues with functionality aimed at all tasks between cleaned source data (in a PostgreSQL database) and design matrices.

## Components

  • [LabelGenerator](architect/label_generators.py): Create binary labels suitable for a design matrix by querying a database table containing outcome events.
  • [FeatureGenerator](architect/feature_generators.py): Create aggregate features suitable for a design matrix from a set of database tables containing events. Uses [collate](https://github.com/dssg/collate/) to build aggregation SQL queries.
  • [FeatureGroupCreator](architect/feature_group_creator.py), [FeatureGroupMixer](architect/feature_group_mixer.py): Create groupings of features, and mix them using different strategies (like ‘leave one out’) to test their effectiveness.
  • [Planner](architect/planner.py), [Builder](architect/builders.py): Build all design matrices needed for an experiment, taking into account different labels, state configurations, and feature groups.

In addition to being usable individually to assist in different aspects of building matrices in your project, the Architect components are integrated in [triage](https://github.com/dssg/triage) as a part of an entire modeling experiment that incorporates later tasks like model training and testing.

## Distributing, Building & Testing

The Architect is a Python package distributable via setuptools. It may be installed directly using easy_install or pip, or listed as a dependency of another package (namely triage), under the package name matrix-architect.

To build this package for development, its dependencies may be installed using pip:

pip install -r requirements_dev.txt




平台:未知 分类器:开发状态::2-pre-alpha 分类器:目标受众::开发人员 分类器:自然语言:英语 分类器:编程语言::python::3.4

欢迎加入QQ群-->: 979659372 Python中文网_新手群


空间计数器在Java中不起作用   json在java中表示XPath列表的最佳方式   java报警服务接收器安卓   java注入bean在自定义的all存储库中   java从迁移到Spring MVC 4+Hibernate5   JavaEclipseIDE透视图被缓存,更改没有任何影响   java Hibernate:在将对象插入Derby嵌入式数据库时引发SQLGrammerException   适用于mp4或mp3文件的java Exoplayer自动流媒体质量   安卓如何在java的静态方法中使用这个关键字?   SSL服务器端的spring主机名验证+Tomcat和Java 8   java Eclipse强制刷新IDs   java有可能返回Mono。只是从GetMapping(“/”)处理程序中获取(“索引”)吗?   arraylist当我用java编程时,我遇到了这个错误,有人能告诉我到底出了什么问题吗?   java如何更改Apache CXF web服务中的日期时间格式   Jfoenix ChipView中的java多线程   java任务在Spark上不可序列化