基于本体的复杂数据集分类机器学习工具
deepoc的Python项目详细描述
深舱
deepoc是BioModel Classifier的核心,python应用程序使用深度神经网络自动分类生物模型。deepoc为基于本体的模型分类提供了一些非常低级的功能,使我们能够适应任何其他的项目。
安装
pip install deepoc
用法
首先,您需要一个基本事实数据集,它是一个模型的dict及其相应本体的列表
{
"model_1": ["GO:00001", "GO:00003", "GO:00002"],
"model_2: ["GO:00004", "GO:00002"]
}
生成数据集并训练dnn模型:
ground_truth = ...
train_file = "path/to/your train csv file"
test_file = "path/to/your test csv file"
val_file = "path/to/your val csv file"
features = deepoc.build_features(ground_truth)
# Picking the first 300 features
selected_features = [feature['feature'] for idx, feature in enumerate(features) if idx < 300]
train, test, val = deepoc.generate_dataset(ground_truth, features, classes)
# Writing dataset to file
deepoc.write_dataset_to_file(train, train_file)
deepoc.write_dataset_to_file(test, test_file)
deepoc.write_dataset_to_file(val, val_file)
# Configure DNN model to use Gradient Descent optimizer, 1 hidden layer with 150 nodes, learning rate of 0.001 and dropout rate of 0.5
classifier = DeepOCClassifier(workspace, 'GD', [150], 0.001, train_file, test_file, classes, 0.5)
# Train the model with 3000 epoch, validate every 10 epochs and batch size of 16
classifier.train_dll_model(3000, 10, 16)
# Validate the result:
for record in val:
model = record['model']
predict_result = classifier.predict(record)
logger.info('Model %s: %s', model, predict_result)
更多示例可以在tests文件夹中找到。
基于除基因本体以外的任何本体的分类模型
要使此库与其他类型的本体一起工作,请根据您的本体实现OntologyService,并在https://bitbucket.org/biomodels/deepoc/src/master/deepoc/ontology/init.py
开发人员
联系人
许可
生物模型分类器源代码在GNU Affero通用公共许可证下分发。
有关软件可用性和分发的信息,请阅读license.txt。