异构生物医学图形数据链路预测模型评价框架
openbiolink的Python项目详细描述
OpenBioLink公司
OpenBioLink是一个资源和评估框架,用于评估异构生物医学图形数据的链接预测模型。它包含基准数据集以及创建这些基准数据集和评估服装模型的底层脚本。在
安装
Pip
- {a3}安装适合您系统的版本^
pip install openbiolink
来源
- 克隆git存储库或下载项目
- 创建一个新的python3.7或python3.6虚拟环境(注意:在Windows下,只有python3.6可以工作)
例如。:
python3 -m venv my_venv
- 激活虚拟环境
- 窗口:
my_venv\Scrips\activate
- linux/mac:
source my_venv/bin/activate
- 窗口:
- 安装适合您的系统的Pythorch版本https://pytorch.org/
- 安装中所述的要求要求.txte、 g.
pip install -r requirements.txt
基准数据集
OpenBioLink2020 Dataset是一个极具挑战性的项目 基准数据集包含超过500万条正负边。 测试集不包含来自训练集的简单可预测的反向边 并且包含所有不同的边缘类型,以提供更真实的边缘预测 脚本。在
排行榜
model | hits@10 | hits@1 | paper | code |
---|---|---|---|---|
TransE (Baseline) | 0.0749 | 0.0125 | (under review) | Code |
TransR (Baseline) | 0.0639 | 0.0096 | (under review) | Code |
也能够分析数据质量的影响以及 提供了OpenBioLink2020的其他设置的评估图,有向和无向设置, 有和没有质量截止期。在
- OpenBioLink2020: directed, high quality(默认数据集)
- OpenBioLink2020: undirected, high quality
- OpenBioLink2020: directed, no quality cutoff
- OpenBioLink2020: undirected, no quality cutoff
手册
OpenBioLink框架由三部分组成,称为操作
- 图形创建
- 列车测试分割创建
- 培训与评估
通过图形创建和列车测试集操作,可以创建符合个人需求的定制数据集。 最后一个动作作为训练和评估链路预测模型的接口。在
通过GUI调用
通过不带任何参数调用程序,gui启动, 提供一个方便的接口来定义所需的参数。最后一步, 将显示相应的命令行选项。在
通过命令行调用
从文件夹src
python -m openbiolink.openBioLink -p WORKING_DIR_PATH [-action] [--options] ...
Action:图形创建
-g:
--undir Output-Graph should be undirectional (default = directional)
--qual quality cutoff of the output-graph, options = [hq, mq, lq], (default = None -> all entries are used)
--no_interact Disables interactive mode - existing files will be replaced (default = interactive)
--skip Existing files will be skipped - in combination with --no_interact (default = replace)
--no_dl No download is being performed (e.g. when local data is used)
--no_in No input_files are created (e.g. when local data is used)
--no_create No graph is created (e.g. when only in-files should be created)
--out_format [Format] [Sep] Format of graph output, takes 2 arguments: list of file formats
[s= single file, m=multiple files] and list of separators
(e.g. t=tab, n=newline, or any other character) (default= s t)
--no_qscore The output files will contain no scores
--dbs [Cls] custom source databases selection to be used, full class name, options --> see doc
--mes [Cls] custom meta edges selection to be used, full class name, options --> see doc
Action:训练-测试分割生成
^{pr2}$Action:培训和评估
-e
--model_cls Cls class of the model to be trained/evaluated (required with -e)
--config Path Path to the models config file
--no_train No training is being performed, trained model id provided via --trained_model
--trained_model Path Path to trained model (required with --no_train)
--no_eval No evaluation is being performed, only training
--test Path Path to test set file (required with -e)
--train Path Path to trainings set file')
--eval_nodes Path Path to the nodes file (required for ranked triples if no corrupted triples
file is provided and nodes cannot be taken from graph creation
--metrics [Metric] list of evaluation metrics
--ks [K] k's for hits@k metric (integer list)
- 项目
标签: