我尝试使用子目录作为通配符,但是snakemake
将通配符扩展到子目录中。我试图提出一个最小的例子,但这并不容易。如果例子不那么清楚,我会道歉。不过,它应该是开箱即用的
管道说明
rule firststep
:此规则基本上为通配符创建两个文件夹runs
runs = ['run1', 'run2']
rule firststep:
output:
'{run}/firststep_done.txt'
shell:
'touch {output} ;'
checkpoint secondstep
:此规则将输出任意数量的子目录,这些子目录稍后将用作wildcards
(projectA
&;projectB
)。在子目录中,会生成任意数量的文件
checkpoint secondstep:
input:
'{run}/firststep_done.txt',
output:
DIR = directory('{run}/secondstep')
shell:
'mkdir -p {output.DIR} ;'
'mkdir -p {wildcards.run}/secondstep/projectA ;'
'touch {wildcards.run}/secondstep/projectA/file_arbitrary.1 ;'
'touch {wildcards.run}/secondstep/projectA/file_arbitrary.2 ;'
'mkdir -p {wildcards.run}/secondstep/projectB ;'
'touch {wildcards.run}/secondstep/projectB/file_arbitrary.1 ;'
'touch {wildcards.run}/secondstep/projectB/file_arbitrary.2 ;'
rule intermediate
:此规则使用新的通配符projects
在另一个目录中创建文件,其中子目录是projects
通配符
rule intermediate:
input:
directory('{run}/secondstep/{project}')
output:
'{run}/report/{project}/arbitrary.all'
shell:
'echo "foo" > {output}'
作为下一步,我将为聚合规则创建一个输入函数:
def resolve_project(wildcards):
checkpoint_output=checkpoints.secondstep.get(**wildcards).output[0]
return expand('{run}/report/{project}/arbitrary.all',
run=wildcards.run,
project=glob_wildcards(os.path.join(checkpoint_output,
"{project}")).project)
然后,最后一条规则aggregate
使用函数创建的输入来完成管道:
rule aggregate:
input:
resolve_project
output:
'{run}/report/{run}_done'
shell:
'cat {input} > {output}'
我在下面发布了完整的复制粘贴管道
我看到两个问题:
rule intermediate
的通配符如下:wildcards: run=run1, project=projectB/file_arbitrary.2
但我希望通配符{project}
仅为projectA
或projectB
。我怎样才能做到这一点
secondstep
文件夹中创建了一个.snakemake_timestamp
,所以我还有一个名为.snakemake_timestamp
的通配符。如何让snakemake仅从目录中推断通配符李>感谢您的帮助
完整管道:
runs = ['run1', 'run2']
rule all:
input:
expand('{run}/report/{run}_done', run = runs)
rule firststep:
output:
'{run}/firststep_done.txt'
shell:
'touch {output} ;'
checkpoint secondstep:
input:
'{run}/firststep_done.txt',
output:
DIR = directory('{run}/secondstep')
shell:
'mkdir -p {output.DIR} ;'
'mkdir -p {wildcards.run}/secondstep/projectA ;'
'touch {wildcards.run}/secondstep/projectA/file_arbitrary.1 ;'
'touch {wildcards.run}/secondstep/projectA/file_arbitrary.2 ;'
'mkdir -p {wildcards.run}/secondstep/projectB ;'
'touch {wildcards.run}/secondstep/projectB/file_arbitrary.1 ;'
'touch {wildcards.run}/secondstep/projectB/file_arbitrary.2 ;'
rule intermediate:
input:
directory('{run}/secondstep/{project}')
output:
'{run}/report/{project}/arbitrary.all'
shell:
'echo "blabla" > {output}'
def resolve_project(wildcards):
checkpoint_output=checkpoints.secondstep.get(**wildcards).output[0]
return expand('{run}/report/{project}/arbitrary.all',
run=wildcards.run,
project=glob_wildcards(os.path.join(checkpoint_output,
"{project}")).project)
rule aggregate:
input:
resolve_project
output:
'{run}/report/{run}_done'
shell:
'cat {input} > {output}'
编辑
正如下面的答案所指出的,这是一个通配符约束的问题。但是,执行全局通配符约束不起作用。由于有一个输入函数在起作用,因此必须在glob_wildcards
语句中定义约束:
def resolve_project(wildcards):
checkpoint_output=checkpoints.secondstep.get(**wildcards).output[0]
return expand('{run}/report/{project}/arbitrary.all',
run=wildcards.run,
project=glob_wildcards(os.path.join(checkpoint_output,
"{project, [^/|^.]+}")).project)
你需要的是
wildcard_constraints
:https://snakemake.readthedocs.io/en/stable/tutorial/additional_features.html#constraining-wildcards这允许您定义一个正则表达式,将通配符限制为使用正则表达式定义的内容。例如:
定义约束有几种方法:全局、规则或内联。下面是一个内联约束的示例:
output: '{run}/report/{project,[^/]+}/arbitrary.all'
相关问题 更多 >
编程相关推荐