将sam/bam文件与智能单元条形码保存合并

mergeBams的Python项目详细描述


PyPI

合并BAMS

=======

版本0.14

将sam/bam文件与智能单元条形码保存合并。这已经在bam文件和10X Genomics Cellranger程序的tsv输出上进行了测试。mergeBams的实现是由Cellranger输出驱动的,主要是为处理Cellranger输出而设计的。在

要求

  1. Python>;3.5(mergeBams使用pysam包,但如果尚未安装,则会尝试安装)

pipx安装

module load Python
python3 -m pip install --user pipx
python3 -m pipx ensurepath

要了解更多关于pipx的信息,请访问https://github.com/pipxproject/pipx。在

安装mergeBams

安装了pipx之后,mergeBams的安装就很简单了。在

^{pr2}$

测试安装mergeBams

然后应该能够通过调用mergeBams来测试安装。运行以下操作后,您应该会看到显示的帮助屏幕。在

mergeBams -h

帮助

usage: mergeBams [-h] -i INPUTS [-l LABELS][-b BCS][-o OUT][--cell_tag CELL_TAG]

merge sam/bam files with intelligent cell barcode preservation

optional arguments:
  -h, --help            show this help message and exit
  -i INPUTS, --inputs INPUTS
                        sam/bam input files, comma-separated
  -l LABELS, --labels LABELS
                        strings for prepending cell barcode (i.e. sample
                        name), comma-separated
  -b BCS, --bcs BCS     barcodes files, comma-separated
  -o OUT, --out OUT     outdir
  --cell_tag CELL_TAG   setif cell barcode tag should not be CB

使用

以下是合并两个bam文件和两个条形码.tsv从它们派生的文件。在

mergeBams -i t1.bam,t2.bam \
          -l t1_,t2_ \
          -b barcodes1.tsv,barcodes2.tsv \
          -o /home/user/test

预期产量

在上面的示例中,mergeBams将使用具有以下数据的输入bams t1.bam和t2.bam…

samtools view t1.bam | head -n 3 -
A00613:162:HKWCTDRXX:1:1228:5330:21151  2721120480 91M * 00 GCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:7  HI:i:4  AS:i:89 nM:i:0  RE:A:I  li:i:0  BC:Z:GCTGTCCA QT:Z:FFFFFFFF CR:Z:ACACCAAAGGTTCCTA CY:Z:FFFFFFFFFFFFFFFF CB:Z:ACACCAAAGGTTCCTA-1 UR:Z:ACCAGTCGGT UY:Z:FFFFFFFFFF UB:Z:ACCAGTCGGT RG:Z:B1_GEX:0:1:HKWCTDRXX:1
A00613:162:HKWCTDRXX:1:1166:7455:25708  2561167240 42M92N49M * 00 GTGGGGGCGGTGGTGGTGCTGTTAGTACCCCATCTTGTAGGTCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6  HI:i:4  AS:i:85 nM:i:2  RE:A:I  li:i:0  BC:Z:TTGAGATC QT:Z:FFFFFFFF CR:Z:TTTATGCGTCGCCATG CY:Z:FFFFFFFFFFFFFFFFCB:Z:TTTATGCGTCGCCATG-1  UR:Z:CTAGTTGCGC UY:Z:FFFFFFFFFF UB:Z:CTAGTTGCGC RG:Z:B1_GEX:0:1:HKWCTDRXX:1
A00613:162:HKWCTDRXX:1:1272:21866:31062 2561182980 73M18S  * 00 CTCAATCTTGGCCTGGGCCAAGGAGACCTTCTCTCCAATGGCCTGCACCTGGCTCCGGCTCTGCTCTACCTGCGAAGTTGCTCGGCGCCCT FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:8  HI:i:5  AS:i:71 nM:i:0  RE:A:I  li:i:0  BC:Z:TTGAGATC QT:Z::FFFFFFF CR:Z:AACTGGTAGAGTGACC CY:Z:FFFFFFFFF:FFFFFF CB:Z:AACTGGTAGAGTGACC-1 UR:Z:GTTCACCATA UY:Z:FFFFFFFFFF UB:Z:GTTCACCATA RG:Z:B1_GEX:0:1:HKWCTDRXX:1

samtools view t2.bam | tail -n 3 -
A00613:162:HKWCTDRXX:2:2107:15519:35790 4 * 00 * * 00 ATGAGAAGGCACCCAAGCTTTACCAATAACACCATAAGGATAGGTGCGTACACCACACGCCTCAAACGGCCCCAGATAACTGGTGTCGTCC F:F:,,:,:,,FF,F,:F:F:,FF,,FFF,,,,,,,,:F::,,:,,,F,:,FFF,,,F,:,:::,:F,,FF,,,FFF,FF,,FFF,,F,:: NH:i:0  HI:i:0  AS:i:18 nM:i:1  uT:A:1  xf:i:0  li:i:0  BC:Z:TGGAAGGT QT:Z:FF,,F,:F CR:Z:TTTGTCATCCGTTGTC CY:Z:F,FFF:,FF:F:FFFFCB:Z:TTTGTCATCCGTTGTC-1  UR:Z:TCCCGCTCAT UY:Z:FFFFFFFFFF UB:Z:TCCCGCTCAT RG:Z:B2_GEX:0:1:HKWCTDRXX:2
A00613:162:HKWCTDRXX:2:2177:9046:12085  4 * 00 * * 00 AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATATT FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F:F:FFFFFFFFFFFFFFFFFFFF:,F,:, NH:i:0  HI:i:0  AS:i:46 nM:i:0  uT:A:1  xf:i:0  li:i:0  BC:Z:GCATCTCC QT:Z:FFFFFFFF CR:Z:TTTGTCATCCTGCAGG CY:Z:F:FFFFFFFF:FF:FFCB:Z:TTTGTCATCCTGCAGG-1  UR:Z:CTGCCTATCA UY:Z:FFFFFFFFFF UB:Z:CTGCCTATCA RG:Z:B2_GEX:0:1:HKWCTDRXX:2
A00613:162:HKWCTDRXX:2:2234:20546:22514 4 * 00 * * 00 AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGTAAAAAACACCCCCGGTGGGGGGTGGGTAATT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF,,:,F:,F,,:,,FF,::,,,FF,,,,::,,,,F NH:i:0  HI:i:0  AS:i:36 nM:i:0  uT:A:1  xf:i:0  li:i:0  BC:Z:AACGTCAA QT:Z:FFFFFFFF CR:Z:TTTGTCATCGGTTCGG CY:Z:FFFFFFFFFFFFFFFFCB:Z:TTTGTCATCGGTTCGG-1  UR:Z:GCACTGCGAG UY:Z:FF:FFFFF:F UB:Z:GCACTGCGAG RG:Z:B2_GEX:0:1:HKWCTDRXX:2

这些bam文件将连接在一起,但将在单元条形码(CB标记)前面添加程序调用中使用-l标志提供的标签

(samtools view out.bam | head -n 3 -; samtools view out.bam | tail -n 3 -) > topandbottom.txt
cat topandbottom.txt
A00613:162:HKWCTDRXX:1:1228:5330:21151  2721120480 91M * 00 GCAAGCTGAGCACTGGAGTGGAGTTTTCCTGTGGAGAGGAGCCATGCCTAGAGTGGGATGGGCCATTGTTCATCTTCTGGCCCCTGTTGTC FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:7  HI:i:4  AS:i:89 nM:i:0  RE:A:I  li:i:0  BC:Z:GCTGTCCA QT:Z:FFFFFFFF CR:Z:ACACCAAAGGTTCCTA CY:Z:FFFFFFFFFFFFFFFF UR:Z:ACCAGTCGGT UY:Z:FFFFFFFFFF UB:Z:ACCAGTCGGT RG:Z:B1_GEX:0:1:HKWCTDRXX:1 CB:Z:t1_ACACCAAAGGTTCCTA-1
A00613:162:HKWCTDRXX:1:1166:7455:25708  2561167240 42M92N49M * 00 GTGGGGGCGGTGGTGGTGCTGTTAGTACCCCATCTTGTAGGTCTTGAGAGGCTCGGCTACCTCAGTGTGGAAGGTGGGCAGTTCTGGAATG FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:6  HI:i:4  AS:i:85 nM:i:2  RE:A:I  li:i:0  BC:Z:TTGAGATC QT:Z:FFFFFFFF CR:Z:TTTATGCGTCGCCATG CY:Z:FFFFFFFFFFFFFFFFUR:Z:CTAGTTGCGC  UY:Z:FFFFFFFFFF UB:Z:CTAGTTGCGC RG:Z:B1_GEX:0:1:HKWCTDRXX:1 CB:Z:t1_TTTATGCGTCGCCATG-1
A00613:162:HKWCTDRXX:1:1272:21866:31062 2561182980 73M18S  * 00 CTCAATCTTGGCCTGGGCCAAGGAGACCTTCTCTCCAATGGCCTGCACCTGGCTCCGGCTCTGCTCTACCTGCGAAGTTGCTCGGCGCCCT FFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF NH:i:8  HI:i:5  AS:i:71 nM:i:0  RE:A:I  li:i:0  BC:Z:TTGAGATC QT:Z::FFFFFFF CR:Z:AACTGGTAGAGTGACC CY:Z:FFFFFFFFF:FFFFFF UR:Z:GTTCACCATA UY:Z:FFFFFFFFFF UB:Z:GTTCACCATA RG:Z:B1_GEX:0:1:HKWCTDRXX:1 CB:Z:t1_AACTGGTAGAGTGACC-1
A00613:162:HKWCTDRXX:2:2107:15519:35790 4 * 00 * * 00 ATGAGAAGGCACCCAAGCTTTACCAATAACACCATAAGGATAGGTGCGTACACCACACGCCTCAAACGGCCCCAGATAACTGGTGTCGTCC F:F:,,:,:,,FF,F,:F:F:,FF,,FFF,,,,,,,,:F::,,:,,,F,:,FFF,,,F,:,:::,:F,,FF,,,FFF,FF,,FFF,,F,:: NH:i:0  HI:i:0  AS:i:18 nM:i:1  uT:A:1  xf:i:0  li:i:0  BC:Z:TGGAAGGT QT:Z:FF,,F,:F CR:Z:TTTGTCATCCGTTGTC CY:Z:F,FFF:,FF:F:FFFFUR:Z:TCCCGCTCAT  UY:Z:FFFFFFFFFF UB:Z:TCCCGCTCAT RG:Z:B2_GEX:0:1:HKWCTDRXX:2 CB:Z:t2_TTTGTCATCCGTTGTC-1
A00613:162:HKWCTDRXX:2:2177:9046:12085  4 * 00 * * 00 AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTATATT FFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFF:F:F:FFFFFFFFFFFFFFFFFFFF:,F,:, NH:i:0  HI:i:0  AS:i:46 nM:i:0  uT:A:1  xf:i:0  li:i:0  BC:Z:GCATCTCC QT:Z:FFFFFFFF CR:Z:TTTGTCATCCTGCAGG CY:Z:F:FFFFFFFF:FF:FFUR:Z:CTGCCTATCA  UY:Z:FFFFFFFFFF UB:Z:CTGCCTATCA RG:Z:B2_GEX:0:1:HKWCTDRXX:2 CB:Z:t2_TTTGTCATCCTGCAGG-1
A00613:162:HKWCTDRXX:2:2234:20546:22514 4 * 00 * * 00 AAGCAGTGGTATCAACGCAGAGTACTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTTAGTAAAAAACACCCCCGGTGGGGGGTGGGTAATT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFF,,:,F:,F,,:,,FF,::,,,FF,,,,::,,,,F NH:i:0  HI:i:0  AS:i:36 nM:i:0  uT:A:1  xf:i:0  li:i:0  BC:Z:AACGTCAA QT:Z:FFFFFFFF CR:Z:TTTGTCATCGGTTCGG CY:Z:FFFFFFFFFFFFFFFFUR:Z:GCACTGCGAG  UY:Z:FF:FFFFF:F UB:Z:GCACTGCGAG RG:Z:B2_GEX:0:1:HKWCTDRXX:2 CB:Z:t2_TTTGTCATCGGTTCGG-1

类似地,如果需要,mergebam将连接并向添加标签条形码.tsv文件(用于压缩条形码.tsv.gz有关如何处理条形码文件压缩的说明,请参见下文)。例如,在上述情况下。。。在

head -n 3 barcodes1.tsv
AAACCTGAGCCCGAAA-1
AAACCTGAGGTGCTTT-1
AAACCTGAGTACTTGC-1

tail -n 3 barcodes2.tsv
TTTGTCATCATTCACT-1
TTTGTCATCCGTTGTC-1
TTTGTCATCCTGCAGG-1

将被连接并被赋予标签。在

(head -n 3 outbcs.tsv; tail -n 3 outbcs.tsv) > topandbottombc.txt
cat topandbottombc.txt
t1_AAACCTGAGCCCGAAA-1
t1_AAACCTGAGGTGCTTT-1
t1_AAACCTGAGTACTTGC-1
t2_TTTGTCATCATTCACT-1
t2_TTTGTCATCCGTTGTC-1
t2_TTTGTCATCCTGCAGG-1

请注意,此程序支持压缩,并将压缩条形码文件的输出以匹配输入。一、 以下将生成压缩条形码文件作为输出。提供的所有条形码文件必须全部压缩或全部未压缩。在

mergeBams -i t1.bam,t2.bam \
          -l t1_,t2_ \
          -b barcodes1.tsv.gz,barcodes2.tsv.gz \
          -o /home/user/test

致谢

由Scott Furlan在CFooldood和rcguy的帮助下编写

欢迎加入QQ群-->: 979659372 Python中文网_新手群

推荐PyPI第三方库


热门话题
c#Java Tcp服务器和。Net Tcp客户端的发送和接收问题   安卓应用程序上的java标记地理位置,其位置位于我周围5Km半径范围内。   向java添加对话框并检索html文件   当eclipse甚至无法打开时,java会在eclipse中更改不兼容的JVM   java中同一jframe中的jlabel和paintComponent   基于另一数组排序的java排序   java AADSTS7000012:该补助金是为另一个租户获得的   java在JSF中使用foreach循环   java如何通过maven为运行junit测试创建运行配置?   java Selenium webDriver不稳定错误堆栈跟踪   java有没有办法创建以键为大写的JSON对象?