FASTX解析和kmer方法

needletail的Python项目详细描述


CIcrates.io

针尾

needetail是一个麻省理工学院授权的、最少拷贝的FASTA/FASTQ解析器和k-mer处理库,用于Rust。在

目标是编写一个快速的经过良好测试的函数集,以便更专业的生物信息学程序使用。 needetail的目标是在解析FASTX文件时与readfqC库一样快,并且比k-mer计数的等效Python实现快得多(即25倍)。在

示例

externcrateneedletail;useneedletail::{parse_fastx_file,Sequence,FastxReader};//!fnmain(){letfilename="tests/data/28S.fasta";//!letmutn_bases=0;letmutn_valid_kmers=0;letmutreader=parse_fastx_file(&filename).expect("valid path/file");whileletSome(record)=reader.next(){letseqrec=record.expect("invalid record");// keep track of the total number of basesn_bases+=seqrec.num_bases();// normalize to make sure all the bases are consistently capitalized and// that we remove the newlines since this is FASTAletnorm_seq=seqrec.normalize(false);// we make a reverse complemented copy of the sequence first for// `canonical_kmers` to draw the complemented sequences from.letrc=norm_seq.reverse_complement();// now we keep track of the number of AAAAs (or TTTTs via// canonicalization) in the file; note we also get the position (i.0;// in the event there were `N`-containing kmers that were skipped)// and whether the sequence was complemented (i.2) in addition to// the canonical kmer (i.1)for(_,kmer,_)innorm_seq.canonical_kmers(4,&rc){ifkmer==b"AAAA"{n_valid_kmers+=1;}}}println!("There are {} bases in your file.",n_bases);println!("There are {} AAAAs in your file.",n_valid_kmers);}

安装

needretail需要安装rust和{}。 请使用本地包管理器(homebrewapt-getpacman等)或通过rustup安装。在

一旦设置了Rust,就可以在Cargo.toml文件中包括针线,如下所示:

^{pr2}$

安装针尾进行开发:

git clone https://github.com/onecodex/needletail
cargo test# to run tests

Python

要在Mac OS X/Unix系统上使用Python库(需要Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh
rustup default nightly

# finally, install the library in the local virtualenv
maturin develop --cargo-extra-args="--features=python"

构建二元轮子并推到PyPI

# The Mac build requires switching through a few different python versions
maturin build --cargo-extra-args="--features=python" --release --strip

# The linux build is automated through cross-compiling in a docker image
docker run --rm -v $(pwd):/io konstin2/maturin:master build --cargo-extra-args="--features=python" --release --strip
twine upload target/wheels/*

寻求帮助

问题最好作为GitHub问题来处理。我们计划很快添加更多的文档,但同时“doc”注释也包含在源代码中。在

贡献

请这样做!我们很乐意讨论可能的添加和/或接受请求。在

致谢

从0.4开始,解析器算法取自seq_io。虽然它有轻微的修改,但主要是 从图书馆来的。原始文件的链接可在src/parser/fast{a,q}.rs中找到。在

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
java CDI@Alternative注释和@ApplicationScoped   java无法使用socket从服务器接收数据。recv()   StormCrawler和Hortonworks 1.1.0.2.6.4.091之间的java Commons日志记录版本冲突   java是否可以在静态类中注入mock   用逻辑填充int[2000][2000]时发生java StackOverflow错误   java为什么返回真值?   java如何告诉springboot中的elasticsearch使用插件   java AsyncTask未按预期返回布尔值   java我无法创建JSONObject的实例   java计算最终映射中的总行数减少hadoop中的输出   java Android通知未在后台显示   java断言在JUnit中失败   java在滚动窗格中使用多种文本颜色?   Netbeans中Maven子项目的java顺序