将sam/bam文件与智能单元条形码保存合并
mergeBams的Python项目详细描述
合并BAMS
=======
版本0.14
将sam/bam文件与智能单元条形码保存合并。这已经在bam文件和10X Genomics Cellranger程序的tsv输出上进行了测试。mergeBams的实现是由Cellranger输出驱动的,主要是为处理Cellranger输出而设计的。在
要求
- Python>;3.5(mergeBams使用pysam包,但如果尚未安装,则会尝试安装)
pipx安装
module load Python python3 -m pip install --user pipx python3 -m pipx ensurepath
要了解更多关于pipx的信息,请访问https://github.com/pipxproject/pipx。在
安装mergeBams
安装了pipx之后,mergeBams的安装就很简单了。在
^{pr2}$测试安装mergeBams
然后应该能够通过调用mergeBams来测试安装。运行以下操作后,您应该会看到显示的帮助屏幕。在
mergeBams -h
帮助
usage: mergeBams [-h] -i INPUTS [-l LABELS][-b BCS][-o OUT][--cell_tag CELL_TAG] merge sam/bam files with intelligent cell barcode preservation optional arguments: -h, --help show this help message and exit -i INPUTS, --inputs INPUTS sam/bam input files, comma-separated -l LABELS, --labels LABELS strings for prepending cell barcode (i.e. sample name), comma-separated -b BCS, --bcs BCS barcodes files, comma-separated -o OUT, --out OUT outdir --cell_tag CELL_TAG setif cell barcode tag should not be CB
使用
以下是合并两个bam文件和两个条形码.tsv从它们派生的文件。在
mergeBams -i t1.bam,t2.bam \ -l t1_,t2_ \ -b barcodes1.tsv,barcodes2.tsv \ -o /home/user/test
预期产量
在上面的示例中,mergeBams将使用具有以下数据的输入bams t1.bam和t2.bam…
samtools view t1.bam | head -n 3 -
A00613:162:HKWCTDRXX:1:1228:5330:21151 2721120480 91M * 00 GCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:7 HI:i:4 AS:i:89 nM:i:0 RE:A:I li:i:0 BC:Z:GCTGTCCA QT:Z:FFFFFFFF CR:Z:ACACCAAAGGTTCCTA CY:Z:FFFFFFFFFFFFFFFF CB:Z:ACACCAAAGGTTCCTA-1 UR:Z:ACCAGTCGGT UY:Z:FFFFFFFFFF UB:Z:ACCAGTCGGT RG:Z:B1_GEX:0:1:HKWCTDRXX:1 A00613:162:HKWCTDRXX:1:1166:7455:25708 2561167240 42M92N49M * 00 GTGGGGGCGGTGGTGGTGCTGTTAGTACCCCATCTTGTAGGTCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6 HI:i:4 AS:i:85 nM:i:2 RE:A:I li:i:0 BC:Z:TTGAGATC QT:Z:FFFFFFFF CR:Z:TTTATGCGTCGCCATG CY:Z:FFFFFFFFFFFFFFFFCB:Z:TTTATGCGTCGCCATG-1 UR:Z:CTAGTTGCGC UY:Z:FFFFFFFFFF UB:Z:CTAGTTGCGC RG:Z:B1_GEX:0:1:HKWCTDRXX:1 A00613:162:HKWCTDRXX:1:1272:21866:31062 2561182980 73M18S * 00 CTCAATCTTGGCCTGGGCCAAGGAGACCTTCTCTCCAATGGCCTGCACCTGGCTCCGGCTCTGCTCTACCTGCGAAGTTGCTCGGCGCCCT FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:8 HI:i:5 AS:i:71 nM:i:0 RE:A:I li:i:0 BC:Z:TTGAGATC QT:Z::FFFFFFF CR:Z:AACTGGTAGAGTGACC CY:Z:FFFFFFFFF:FFFFFF CB:Z:AACTGGTAGAGTGACC-1 UR:Z:GTTCACCATA UY:Z:FFFFFFFFFF UB:Z:GTTCACCATA RG:Z:B1_GEX:0:1:HKWCTDRXX:1
和
samtools view t2.bam | tail -n 3 -
A00613:162:HKWCTDRXX:2:2107:15519:35790 4 * 00 * * 00 ATGAGAAGGCACCCAAGCTTTACCAATAACACCATAAGGATAGGTGCGTACACCACACGCCTCAAACGGCCCCAGATAACTGGTGTCGTCC F:F:,,:,:,,FF,F,:F:F:,FF,,FFF,,,,,,,,:F::,,:,,,F,:,FFF,,,F,:,:::,:F,,FF,,,FFF,FF,,FFF,,F,:: NH:i:0 HI:i:0 AS:i:18 nM:i:1 uT:A:1 xf:i:0 li:i:0 BC:Z:TGGAAGGT QT:Z:FF,,F,:F CR:Z:TTTGTCATCCGTTGTC CY:Z:F,FFF:,FF:F:FFFFCB:Z:TTTGTCATCCGTTGTC-1 UR:Z:TCCCGCTCAT UY:Z:FFFFFFFFFF UB:Z:TCCCGCTCAT RG:Z:B2_GEX:0:1:HKWCTDRXX:2 A00613:162:HKWCTDRXX:2:2177:9046:12085 4 * 00 * * 00 AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATATT FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F:F:FFFFFFFFFFFFFFFFFFFF:,F,:, NH:i:0 HI:i:0 AS:i:46 nM:i:0 uT:A:1 xf:i:0 li:i:0 BC:Z:GCATCTCC QT:Z:FFFFFFFF CR:Z:TTTGTCATCCTGCAGG CY:Z:F:FFFFFFFF:FF:FFCB:Z:TTTGTCATCCTGCAGG-1 UR:Z:CTGCCTATCA UY:Z:FFFFFFFFFF UB:Z:CTGCCTATCA RG:Z:B2_GEX:0:1:HKWCTDRXX:2 A00613:162:HKWCTDRXX:2:2234:20546:22514 4 * 00 * * 00 AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGTAAAAAACACCCCCGGTGGGGGGTGGGTAATT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF,,:,F:,F,,:,,FF,::,,,FF,,,,::,,,,F NH:i:0 HI:i:0 AS:i:36 nM:i:0 uT:A:1 xf:i:0 li:i:0 BC:Z:AACGTCAA QT:Z:FFFFFFFF CR:Z:TTTGTCATCGGTTCGG CY:Z:FFFFFFFFFFFFFFFFCB:Z:TTTGTCATCGGTTCGG-1 UR:Z:GCACTGCGAG UY:Z:FF:FFFFF:F UB:Z:GCACTGCGAG RG:Z:B2_GEX:0:1:HKWCTDRXX:2
这些bam文件将连接在一起,但将在单元条形码(CB标记)前面添加程序调用中使用-l标志提供的标签
(samtools view out.bam | head -n 3 -; samtools view out.bam | tail -n 3 -) > topandbottom.txt cat topandbottom.txt
A00613:162:HKWCTDRXX:1:1228:5330:21151 2721120480 91M * 00 GCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:7 HI:i:4 AS:i:89 nM:i:0 RE:A:I li:i:0 BC:Z:GCTGTCCA QT:Z:FFFFFFFF CR:Z:ACACCAAAGGTTCCTA CY:Z:FFFFFFFFFFFFFFFF UR:Z:ACCAGTCGGT UY:Z:FFFFFFFFFF UB:Z:ACCAGTCGGT RG:Z:B1_GEX:0:1:HKWCTDRXX:1 CB:Z:t1_ACACCAAAGGTTCCTA-1 A00613:162:HKWCTDRXX:1:1166:7455:25708 2561167240 42M92N49M * 00 GTGGGGGCGGTGGTGGTGCTGTTAGTACCCCATCTTGTAGGTCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6 HI:i:4 AS:i:85 nM:i:2 RE:A:I li:i:0 BC:Z:TTGAGATC QT:Z:FFFFFFFF CR:Z:TTTATGCGTCGCCATG CY:Z:FFFFFFFFFFFFFFFFUR:Z:CTAGTTGCGC UY:Z:FFFFFFFFFF UB:Z:CTAGTTGCGC RG:Z:B1_GEX:0:1:HKWCTDRXX:1 CB:Z:t1_TTTATGCGTCGCCATG-1 A00613:162:HKWCTDRXX:1:1272:21866:31062 2561182980 73M18S * 00 CTCAATCTTGGCCTGGGCCAAGGAGACCTTCTCTCCAATGGCCTGCACCTGGCTCCGGCTCTGCTCTACCTGCGAAGTTGCTCGGCGCCCT FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:8 HI:i:5 AS:i:71 nM:i:0 RE:A:I li:i:0 BC:Z:TTGAGATC QT:Z::FFFFFFF CR:Z:AACTGGTAGAGTGACC CY:Z:FFFFFFFFF:FFFFFF UR:Z:GTTCACCATA UY:Z:FFFFFFFFFF UB:Z:GTTCACCATA RG:Z:B1_GEX:0:1:HKWCTDRXX:1 CB:Z:t1_AACTGGTAGAGTGACC-1 A00613:162:HKWCTDRXX:2:2107:15519:35790 4 * 00 * * 00 ATGAGAAGGCACCCAAGCTTTACCAATAACACCATAAGGATAGGTGCGTACACCACACGCCTCAAACGGCCCCAGATAACTGGTGTCGTCC F:F:,,:,:,,FF,F,:F:F:,FF,,FFF,,,,,,,,:F::,,:,,,F,:,FFF,,,F,:,:::,:F,,FF,,,FFF,FF,,FFF,,F,:: NH:i:0 HI:i:0 AS:i:18 nM:i:1 uT:A:1 xf:i:0 li:i:0 BC:Z:TGGAAGGT QT:Z:FF,,F,:F CR:Z:TTTGTCATCCGTTGTC CY:Z:F,FFF:,FF:F:FFFFUR:Z:TCCCGCTCAT UY:Z:FFFFFFFFFF UB:Z:TCCCGCTCAT RG:Z:B2_GEX:0:1:HKWCTDRXX:2 CB:Z:t2_TTTGTCATCCGTTGTC-1 A00613:162:HKWCTDRXX:2:2177:9046:12085 4 * 00 * * 00 AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATATT FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F:F:FFFFFFFFFFFFFFFFFFFF:,F,:, NH:i:0 HI:i:0 AS:i:46 nM:i:0 uT:A:1 xf:i:0 li:i:0 BC:Z:GCATCTCC QT:Z:FFFFFFFF CR:Z:TTTGTCATCCTGCAGG CY:Z:F:FFFFFFFF:FF:FFUR:Z:CTGCCTATCA UY:Z:FFFFFFFFFF UB:Z:CTGCCTATCA RG:Z:B2_GEX:0:1:HKWCTDRXX:2 CB:Z:t2_TTTGTCATCCTGCAGG-1 A00613:162:HKWCTDRXX:2:2234:20546:22514 4 * 00 * * 00 AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGTAAAAAACACCCCCGGTGGGGGGTGGGTAATT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF,,:,F:,F,,:,,FF,::,,,FF,,,,::,,,,F NH:i:0 HI:i:0 AS:i:36 nM:i:0 uT:A:1 xf:i:0 li:i:0 BC:Z:AACGTCAA QT:Z:FFFFFFFF CR:Z:TTTGTCATCGGTTCGG CY:Z:FFFFFFFFFFFFFFFFUR:Z:GCACTGCGAG UY:Z:FF:FFFFF:F UB:Z:GCACTGCGAG RG:Z:B2_GEX:0:1:HKWCTDRXX:2 CB:Z:t2_TTTGTCATCGGTTCGG-1
类似地,如果需要,mergebam将连接并向添加标签条形码.tsv文件(用于压缩条形码.tsv.gz有关如何处理条形码文件压缩的说明,请参见下文)。例如,在上述情况下。。。在
head -n 3 barcodes1.tsv
AAACCTGAGCCCGAAA-1 AAACCTGAGGTGCTTT-1 AAACCTGAGTACTTGC-1
和
tail -n 3 barcodes2.tsv
TTTGTCATCATTCACT-1 TTTGTCATCCGTTGTC-1 TTTGTCATCCTGCAGG-1
将被连接并被赋予标签。在
(head -n 3 outbcs.tsv; tail -n 3 outbcs.tsv) > topandbottombc.txt cat topandbottombc.txt
t1_AAACCTGAGCCCGAAA-1 t1_AAACCTGAGGTGCTTT-1 t1_AAACCTGAGTACTTGC-1 t2_TTTGTCATCATTCACT-1 t2_TTTGTCATCCGTTGTC-1 t2_TTTGTCATCCTGCAGG-1
请注意,此程序支持压缩,并将压缩条形码文件的输出以匹配输入。一、 以下将生成压缩条形码文件作为输出。提供的所有条形码文件必须全部压缩或全部未压缩。在
mergeBams -i t1.bam,t2.bam \ -l t1_,t2_ \ -b barcodes1.tsv.gz,barcodes2.tsv.gz \ -o /home/user/test
致谢
由Scott Furlan在CFooldood和rcguy的帮助下编写
- 项目
标签: