binrorry:一个灵活的工具,用于分类和过滤顺序读取
binlorr的Python项目详细描述
货车
BinLorry是一种灵活的工具,用于将排序和过滤顺序读入不同的文件。读取可以通过编码在头文件中、记录在csv文件中或按长度的任何属性进行分类和过滤。
安装
只需使用pip安装即可:
pip3 install binlorry
运行:
binlorry --help
从存储库安装
克隆存储库:
git clone https://github.com/rambaut/binlorry.git
安装:
pip3 install ./binlorry
无需安装即可运行
还可以直接从存储库克隆运行binrorry,而无需安装:
git clone https://github.com/rambaut/binlorry.git
python binlorry/binlorry-runner.py -h
但是,请确保在使用前安装了pandas
包。
快速使用示例
binlorry -i reads/ -o barcode --bin-by barcode --filter-by barcode BC01 BC02 -n 550 -x 750
这将读取目录reads
,bin中头字段barcode
旁边的所有fastq或fasta文件,但前提是这是BC01
或BC02
,并且长度在550到750个核苷酸之间。
它将使用文件名前缀barcode
,结果是文件:barcode_BC01.fastq
和barcode_BC02.fastq
binlorry -i my_file.fastq -t my_file.csv --out-report -o filtered --filter-by reference Type_1 -n 550 -x 750
上面的示例将接收来自my_file.fastq
和csv报告my_file.csv
的读取。假设my_file.csv
至少具有如下所示的结构,并且csv中的读取名与输入读取文件中的读取名匹配,binrorry将筛选读取并仅输出长度介于550到750个基之间的具有类型1引用的读取。
read_name | reference |
---|---|
f66db89e-de96-4fa7-813a-6c5a89586100 | Type_1 |
a39069c5-c493-45f8-9fa8-49eccb5c1807 | Type_1 |
868efa99-f4c1-4a68-87a9-196a44b997e0 | Type_2 |
binlorry -i path/to/my_fastq_dir -t path/to/my_csv_dir \
--out-report -o path/to/binned/barcode \
--filter-by barcode BC01 --bin-by barcode -n 1000 -x 2000
假设csv目录中有对应于fastq目录中读取文件的报告,binrorry将递归地搜索这两个目录,并基于文件名词干匹配csv和fastq文件。然后,此命令将筛选只包含bc01的读取,并输出与输出fastq文件中显示的读取相对应的csv报告。
命令行界面
usage: binlorry -i INPUT [-t CSV_FILE] -o OUTPUT [-v VERBOSITY]
[--bin-by FIELD [FIELD ...]]
[--filter-by FILTER [FILTER ...]] [-n MIN] [-x MAX]
[-h] [--version]
Main options:
-i INPUT, --input INPUT
FASTA/FASTQ of input reads or a directory which will
be recursively searched for FASTQ files (required)
-t INPUT_CSV, --index-table INPUT_CSV
A CSV file with metadata fields for reads (otherwise these are assumed
to be in the read headers). This can also include a file and line number to improve performance. Assumes read name is first column of the csv.'
-o OUTPUT, --output OUTPUT
Output filename (or filename prefix)
-r REPORT, --out-report REPORT
Output a subsetted csv report along with the fastq. (Default: False)
Only implemented for use in conjunction with -t option.
-f FORCE_OUTFILES, --force-output FORCE_OUTFILES
Output binned/ filtered files even if empty. (default: False)
Usage: only a single binning factor with a corresponding filter factor.
-v VERBOSITY, --verbosity VERBOSITY
Level of progress information: 0 = none, 1 = some, 2
= lots, 3 = full - output will go to stdout if reads
are saved to a file and stderr if reads are printed
to stdout (default: 1)
Binning/Filtering options:
--bin-by FIELD [FIELD ...]
Specify header field(s) to bin the reads by. For
multiple fields these will be nested in order
specified.
--filter-by FILTER [FILTER ...]
Specify header field and accepted values to filter
the reads by. Multiple filter-by options can be
specified.
-n MIN, --min-length MIN
Filter the reads by their length, specifying the
minimum length.
-x MAX, --max-length MAX
Filter the reads by their length, specifying the
maximum length.
Help:
-h, --help Show this help message and exit
--version Show program's version number and exit