Snakemake使用expand（）选项输出

rule bowtie2_build: """ """ input: "reference/"+config["reference_genome"]+".fa" output: # every possible extension in expand expand("reference/"+config["reference_genome"]+"{suffix}", suffix=[".1.bt2", ".2.bt2", ".3.bt2", ".4.bt2", ".rev.1.bt2", ".rev.2.bt2", ".1.bt21", ".2.bt21", ".3.bt21", ".4.bt21", ".rev.1.bt21", ".rev.2.bt21"]) params: output_prefix=config["reference_genome"] shell: "bowtie2-build {input} reference/{params.output_prefix}"

3条回答

网友

1楼 · 编辑于 2024-10-02 02:38:33

也许你把事情弄得太复杂了。Bowtie2（对齐器）输入索引文件的前缀，它将自己查找实际的索引文件。因此，我不会将索引文件列为任何规则的输出。只需使用一个标志文件来指示索引已经完成。例如：

rule all:
    input:
        expand('{sample}.sam', sample= ['A', 'B', 'C']),

rule bowtie_build:
    input:
        fa= 'genome.fa',
    output:
        touch('index.done'), # Flag file
    params:
        index_prefix= 'genome',
    shell:
        r"""
        bowtie2-build {input.fa} {params.index_prefix}
        """

rule bowtie_align:
    input:
        idx= 'index.done',
        fq1= '{sample}.R1.fastq.gz',
    output:
        sam= '{sample}.sam',
    params:
        index_prefix= 'genome'
    shell:
        r"""
        bowtie2 -x {params.index_prefix} -U {input.fq1} -S {output.sam}
        """

网友

2楼 · 编辑于 2024-10-02 02:38:33

这种方法是用不同的扩展名重命名文件。如果不存在这样的文件，它将打印错误，但您可能会认为这是一个特性：

rule bowtie2_build:
    input:
        "reference/"+config["reference_genome"]+".fa"
    output:
        expand("reference/"+config["reference_genome"]+"{suffix}", suffix=[".1.bt2", ".2.bt2", ".3.bt2", ".4.bt2", ".rev.1.bt2", ".rev.2.bt2"])
    params:
        output_prefix=config["reference_genome"],
        file_name = "reference/"+config["reference_genome"],
    shell:
        """
        bowtie2-build {input} reference/{params.output_prefix}
        mv -v {params.file_name}.1.bt21 {params.file_name}.1.bt2
        mv -v {params.file_name}.2.bt21 {params.file_name}.2.bt2
        mv -v {params.file_name}.3.bt21 {params.file_name}.3.bt2
        # etc... it's also possible to put this in a bash loop
        """

在其他示例中，如果输入文件具有关于输出的信息，例如，如果从一开始就知道基因组序列大小，那么一个选项是生成两个规则bowtie2_build和bowtie2_build_big，这两个规则将指定不同的扩展名

网友

3楼 · 编辑于 2024-10-02 02:38:33

似乎是checkpoints的用法。对于检查点，DAG将在检查点执行后重新评估。你可以这样做：

from glob import glob

def get_ref_index(wildcards):
    """Returns the reference index for wildcards"""
    checkpoints.bowtie2_build.get()  # Checks to see if the checkpoint rule has been run, might be an issue without wildcards
    suffix=[".1.bt2", ".2.bt2", ".3.bt2", ".4.bt2", ".rev.1.bt2", ".rev.2.bt2", ".1.bt21", ".2.bt21", ".3.bt21", ".4.bt21", ".rev.1.bt21", ".rev.2.bt21"]
    ref_files = glob.glob("reference/" + config["reference_genome"] + "*") # List files with reference genome pattern
    
    for ref in ref_files:
        suffix = ref.split(".", 1)[1]  # Split on first period to grab all suffixes
        if suffix in suffixes:
            return ref
    
checkpoint bowtie2_build:
    input:
        "reference/"+config["reference_genome"]+".fa"
    output:
        touch("reference/"+config["reference_genome"]+".done")  # Breadcrumb file to link to downstream rules, since we don't know what indexes we'll get
    params:
        output_prefix=config["reference_genome"]
    shell:
        "bowtie2-build {input} reference/{params.output_prefix}"
        
rule donwstream_rule:
    input:
        ref = "reference/"+config["reference_genome"]+".fa"
        ref_index = get_ref_index
        ref_index_done = "reference/"+config["reference_genome"]+".done"
    output:
        foo = "bar"
    shell:
        "touch bar"

相关问题更多 >

编程相关推荐

热门问题

热门文章