Edit 2 : I figure it out. I posted my answer as reply.
Edit 1 : I added a beginning of solution at the end of the question following @bli advices and https://stackoverflow.com/a/41185568/1025741
我正在编写一个snakemake文件,在这个文件中我解析了一个示例表文件(在yaml配置文件中定义),以便连接这个示例表中列出的文件。在
示例表如下:
sample unit fq1 fq2
A lane1 A.l1.1.R1.txt A.l1.1.R2.txt
A lane1 A.l1.2.R1.txt A.l1.2.R2.txt
A lane2 A.l2.R1.txt A.l2.R2.txt
B lane1 B.l1.R1.txt B.l1.R2.txt
其思想是连接来自同一样本和样本单元的文件(列在fq1和fq2中)。在这种情况下:
A.l1.1.R1.txt
和A.l2.2.R1.txt
将被连接A.l1.1.R2.txt
和{其他文件将不会串联,但也会在该目录结构中报告:
^{pr2}$所以在这个例子的最后,我应该:
A/
A_lane1_merged_R1.txt
A_lane1_merged_R2.txt
A_lane2_merged_R1.txt
A_lane2_merged_R2.txt
B/
B_lane1_merged_R1.txt
B_lane1_merged_R2.txt
下面是我的snakemake文件来执行这样的任务:
import pandas as pd
shell.executable("bash")
configfile: "config.yaml"
# open samplesheet
units = pd.read_table(config["units"], dtype=str)
units = units.set_index(["sample", "unit"])
rule all:
input:
expand("{sample}/{sample}_{unit}_merge_R1.txt",
sample=units.index.get_level_values('sample').unique(),
unit=units.index.get_level_values('unit').unique()),
expand("{sample}/{sample}_{unit}_merge_R2.txt",
sample=units.index.get_level_values('sample').unique(),
unit=units.index.get_level_values('unit').unique())
def get_fastq_r1(wildcards):
return units.loc[(wildcards.sample, wildcards.unit), ["fq1"]].dropna().values.flatten()
def get_fastq_r2(wildcards):
return units.loc[(wildcards.sample, wildcards.unit), ["fq2"]].dropna().values.flatten()
rule merge:
input:
r1 = get_fastq_r1,
r2 = get_fastq_r2
output:
"{sample}/{sample}_{unit}_merge_R1.txt",
"{sample}/{sample}_{unit}_merge_R2.txt"
shell:
"""
echo {input.r1} > {sample}/{sample}_{unit}_merge_R1.txt
echo {input.r2} > {sample}/{sample}_{unit}_merge_R2.txt
"""
以及配置.yaml公司名称:
units: units.tsv
但是我有一个错误,因为我没有一个单元为lane2
的示例B
:
InputFunctionException in line 29 of /home/nrosewick/Documents/analysis/pilot_data_ADX17009/workflow/test_snakemake/Snakefile:
KeyError: ('B', 'lane2')
Wildcards:
sample=B
unit=lane2
有没有避免这种错误的方法/技巧? 谢谢
Beginning of solution
已使用的@I建议的bli版本itertools.product通过将其包装到高阶生成器中,该生成器检查生成的通配符组合是否在预先建立的列表中:
import pandas as pd
shell.executable("bash")
configfile: "config.yaml"
###
from itertools import product
def filter_combinator(combinator, inlist):
def filtered_combinator(*args, **kwargs):
for wc_comb in combinator(*args, **kwargs):
# Use frozenset instead of tuple
# in order to accomodate
# unpredictable wildcard order
if frozenset(wc_comb) in inlist:
yield wc_comb
return filtered_combinator
# open samplesheet
units = pd.read_table(config["units"], dtype=str)
# list of pair sample-unit included in the samplesheet
inList={
frozenset({("sample", "A"), ("unit", "lane1")}),
frozenset({("sample", "A"), ("unit", "lane2")}),
frozenset({("sample", "B"), ("unit", "lane1")})}
# set df index
units = units.set_index(["sample", "unit"])
# build new iterator
filtered_product = filter_combinator(product, inList)
rule all:
input:
expand("{sample}/{sample}_{unit}_merge_R1.txt",
filtered_product,
sample=units.index.get_level_values('sample').unique().values,
unit=units.index.get_level_values('unit').unique().values),
expand("{sample}/{sample}_{unit}_merge_R2.txt",
filtered_product,
sample=units.index.get_level_values('sample').unique().values,
unit=units.index.get_level_values('unit').unique().values)
def get_fastq_r1(wildcards):
return units.loc[(wildcards.sample, wildcards.unit), ["fq1"]].dropna().values.flatten()
def get_fastq_r2(wildcards):
return units.loc[(wildcards.sample, wildcards.unit), ["fq2"]].dropna().values.flatten()
rule merge:
input:
r1 = get_fastq_r1,
r2 = get_fastq_r2
output:
"{sample}/{sample}_{unit}_merge_R1.txt",
"{sample}/{sample}_{unit}_merge_R2.txt"
message:
"test"
shell:
"""
cat {input.r1} > {sample}/{sample}_{unit}_merge_R1.txt
cat {input.r2} > {sample}/{sample}_{unit}_merge_R2.txt
"""
但在运行snakemake -n
时,它会返回一个错误:
Job 1: test
RuleException in line 53 of /home/nrosewick/Documents/analysis/pilot_data_ADX17009/workflow/test_snakemake/Snakefile:
NameError: The name 'sample' is unknown in this context. Please make sure that you defined that variable. Also note that braces not used for variable access have to be escaped by repeating them, i.e. {{print $1}}
有什么线索吗?在
以下是我基于https://stackoverflow.com/a/41185568/1025741找到的解决方案:
相关问题 更多 >
编程相关推荐