FASTX解析和kmer方法
needletail的Python项目详细描述
针尾
needetail是一个麻省理工学院授权的、最少拷贝的FASTA/FASTQ解析器和k-mer处理库,用于Rust。在
目标是编写一个快速的和经过良好测试的函数集,以便更专业的生物信息学程序使用。 needetail的目标是在解析FASTX文件时与readfqC库一样快,并且比k-mer计数的等效Python实现快得多(即25倍)。在
示例
externcrateneedletail;useneedletail::{parse_fastx_file,Sequence,FastxReader};//!fnmain(){letfilename="tests/data/28S.fasta";//!letmutn_bases=0;letmutn_valid_kmers=0;letmutreader=parse_fastx_file(&filename).expect("valid path/file");whileletSome(record)=reader.next(){letseqrec=record.expect("invalid record");// keep track of the total number of basesn_bases+=seqrec.num_bases();// normalize to make sure all the bases are consistently capitalized and// that we remove the newlines since this is FASTAletnorm_seq=seqrec.normalize(false);// we make a reverse complemented copy of the sequence first for// `canonical_kmers` to draw the complemented sequences from.letrc=norm_seq.reverse_complement();// now we keep track of the number of AAAAs (or TTTTs via// canonicalization) in the file; note we also get the position (i.0;// in the event there were `N`-containing kmers that were skipped)// and whether the sequence was complemented (i.2) in addition to// the canonical kmer (i.1)for(_,kmer,_)innorm_seq.canonical_kmers(4,&rc){ifkmer==b"AAAA"{n_valid_kmers+=1;}}}println!("There are {} bases in your file.",n_bases);println!("There are {} AAAAs in your file.",n_valid_kmers);}
安装
needretail需要安装rust
和{homebrew
、apt-get
、pacman
等)或通过rustup安装。在
一旦设置了Rust,就可以在Cargo.toml
文件中包括针线,如下所示:
安装针尾进行开发:
git clone https://github.com/onecodex/needletail cargo test# to run tests
Python
要在Mac OS X/Unix系统上使用Python库(需要Python 3):
# you need the nightly version of Rust installed curl https://sh.rustup.rs -sSf | sh rustup default nightly # finally, install the library in the local virtualenv maturin develop --cargo-extra-args="--features=python"
构建二元轮子并推到PyPI
# The Mac build requires switching through a few different python versions
maturin build --cargo-extra-args="--features=python" --release --strip
# The linux build is automated through cross-compiling in a docker image
docker run --rm -v $(pwd):/io konstin2/maturin:master build --cargo-extra-args="--features=python" --release --strip
twine upload target/wheels/*
寻求帮助
问题最好作为GitHub问题来处理。我们计划很快添加更多的文档,但同时“doc”注释也包含在源代码中。在
贡献
请这样做!我们很乐意讨论可能的添加和/或接受请求。在
致谢
从0.4开始,解析器算法取自seq_io。虽然它有轻微的修改,但主要是
从图书馆来的。原始文件的链接可在src/parser/fast{a,q}.rs
中找到。在
- 项目
标签: