Python mlnods包_程序模块 - PyPI

一个python包，用于使用图分区拆分机器学习数据集

mlnods的Python项目详细描述

mlNODS公司

使用图划分分割机器学习数据集

适当的评估需要对培训和评估数据集进行适当的分割，而这又需要聚类。对于许多问题，单链接聚类就足够了。遇到这样一个标准程序无法解决的问题，我们开发了一个简单的基于图形的工具来创建唯一的数据集。在

mlNODS是一种基于图的方法，它允许将原始数据集分割成不重叠的集合，这些集合在不删除某些数据的情况下无法分组。mlNODS优化了以下约束：（1）保留尽可能多的数据点，（2）消除两个拆分集之间的任何重叠。图中的节点是原始数据点，连接是节点间相似性的度量（例如蛋白质集的序列相似性）。该方法首先建立完整的图，然后通过删除节点来优化相似表的约束。mlNODS适用于任何问题，并有一个额外的好处，即允许在一个集合内重叠（即同系物训练），而在两个集合之间不允许重叠（即训练和测试不重叠）。在

usage: mlnods [-h] -s SPLITS -c CUTOFF [-l LIMIT] -e EDGES_FILE
                [-f EDGES_FORMAT] -n NODES_FILE [-a][-r RANDOM][-o OUTFOLDER][-v][-q][--version]

This is a script that will create independent sets of data

Version: 1.0 [03/14/20]

optional arguments:
  -h, --help            show this help message and exit
  -s SPLITS, --splits SPLITS
                        number of splits required
  -c CUTOFF, --cutoff CUTOFF
                        similarity cutoff in the units of link scores
  -l LIMIT, --limit LIMIT
                        limit on the number of links for each node (default=0, infinity)
  -e EDGES_FILE, --edges EDGES_FILE
                        file containing a table of instances with link scores for each pair
  -f EDGES_FORMAT, --format EDGES_FORMAT
                        format of the table file

                        blast     : takes a list of -m 9 formated blast files and builds a table based on seqID
                        hssp      : takes a list of -m 9 formated blast files, runs HSSP scoring script and builds an HSSP distance table
                        self<int> : space/tab separated table file, similarity score in column <int>
                                    eg "ID1 ID2 similarity_score" will be addressed as self3 (default=self5)
  -n NODES_FILE, --nodes NODES_FILE
                        instance file containing IDs of all instances being considered

                        IDs are case-independent (eg ABC= abc)
                        IDs are always preceeded by ">" and followed by a white space.
                        No white spaces are allowed in an ID.
                        If score is provided for an ID, it should be surrounded by spaces and directly follow the ID
                        (eg. >abl1_human 10 gene associated with ....)
                        Everything between two IDs is printed in the junction files, but not considered in evaluation
  -a, --abundance       the option to score

                        false : score retrieved from instance file, range [0-100], default=50 when missing
                        true  : score approximated by actual number of times an ID appears in the instance file
  -r RANDOM, --random RANDOM
                        set a fixed random seed to generate consistent partitions
  -o OUTFOLDER, --outfolder OUTFOLDER
                        path to output folder (default=<current directory>
  -v, --verbose         set verbosity level
  -q, --quiet           no logging to stdout
  --version             show program's version number and exit

If an ID is present in the instance file, but not in the table file the ID is considered to not be linked to anything else
If an ID is present in the table file but not in the instance file, it is ignored

mlnods was developed by Yana Bromberg and refactored by Maximilian Miller.

Feel free to contact us for support at services@bromberglab.org.
This software is licensed under [NPOSL-3.0](http://opensource.org/licenses/NPOSL-3.0)

欢迎加入QQ群-->： 979659372

mlnods 1.3

mlnods的Python项目详细描述

mlNODS公司

使用图划分分割机器学习数据集

推荐PyPI第三方库

setuptools-green

canar

helga-productpages

bible

oscar

VCSTodo

odoo12-addon-account-invoice-consolidated

pysparklines

list_prime_nums

redomino.cache

event-gateway-sdk

area

udns

asyncom

slack-tangerine

导航栏

项目链接

标签

维护者

最新PyPI项目

最新Python常见问题

mlnods 1.3

mlnods的Python项目详细描述

mlNODS公司

使用图划分分割机器学习数据集

推荐PyPI第三方库

setuptools-green

canar

helga-productpages

bible

oscar

VCSTodo

odoo12-addon-account-invoice-consolidated

pysparklines

list_prime_nums

redomino.cache

event-gateway-sdk

area

udns

asyncom

slack-tangerine

导 航 栏

项目 链接

标 签

维护者

最新PyPI项目

最新Python常见问题

导航栏

项目链接

标签