我有三个来自3个不同样本的单单元bam文件,需要通过集群将它们拆分为更小的bam。然后,我需要为相同的集群合并来自不同示例的bam文件。我试过使用检查站,但有点迷路了。https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html
这是我发布的这个问题的延续split bam files to (variable) pre-defined number of small bam files depending on the sample
SAMPLE_cluster = { "SampleA" : [ "1", "2", "3" ], "SampleB" : [ "1" ], "SampleC" : [ "1", "2" ] }
CLUSTERS = []
for sample in SAMPLE_cluster:
CLUSTERS.extend(SAMPLE_cluster[sample])
CLUSTERS = sorted(set(CLUSTERS)
rule all:
input: expand("01merged_bam/{cluster_id}.bam, cluster_id = CLUSTERS)
checkpoint split_bam:
input: "{sample}.bam"
output: directory("01split_bam/{sample}/")
shell:
"""
split_bam.sh {input}
"""
## the split_bam.sh will split the bam file to "01split_bam/{sample}/{sample}_{cluster_id}.bam"
def merge_bam_input(wildcards):
checkpoint_output = checkpoints.split_bam.get(**wildcards).output[0]
return expand("01split_bam/{sample}/{sample}_{{cluster_id}}.bam", \
sample = glob_wildcards(os.path.join(checkpoint_output, "{sample}_{cluster_id}.bam")).sample)
rule merge_bam_per_cluster:
input: merge_bam_input
output: "01merged_bam/{cluster_id}.bam"
log: "00log/{cluster_id}.merge_bam.log"
threads: 2
shell:
"""
samtools merge -@ 2 -r {output} {input}
"""
根据群集编号,每个群集的规则合并\u bam\u的输入将更改:
例如,对于集群1:“01split\u bam/SampleC/SampleC\u 1.bam”、“01split\u bam/SampleB/SampleB\u 1.bam”、“01split\u bam/SampleC/SampleC\u 1.bam”。你知道吗
对于集群2:“01split\u bam/samdeace/samdeace\u 2.bam”、“01split\u bam/SampleC/SampleC\u 2.bam”。你知道吗
对于集群3:“01split\u bam/samdeace/samdeace\u 3.bam”。你知道吗
我决定不使用checkpoint,而是使用input函数来获取
它似乎起作用了。你知道吗
相关问题 更多 >
编程相关推荐