我有一个数据帧(p4p5_merge
),当前看起来像这样:
SampleID expr Gene Period tag \
1 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
2 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
3 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
4 HSB666 3.663308 ENSG00000147996 5 HSB666|ENSG00000147996
5 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
6 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
7 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
8 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
9 HSB651 3.207474 ENSG00000174749 4 HSB651|ENSG00000174749
10 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
11 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
12 HSB195 0.214731 ENSG00000188157 4 HSB195|ENSG00000188157
14 HSB152 5.062444 ENSG00000188157 4 HSB152|ENSG00000188157
15 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
16 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
17 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
18 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
19 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
20 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
21 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
22 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
23 HSB627 2.062444 ENSG00000174749 4 HSB627|ENSG00000174749
Consequence
1 upstream_gene_variant
2 upstream_gene_variant
3 upstream_gene_variant
4 upstream_gene_variant
5 upstream_gene_variant
6 upstream_gene_variant
7 upstream_gene_variant
8 upstream_gene_variant
9 upstream_gene_variant
10 upstream_gene_variant
11 upstream_gene_variant
12 upstream_gene_variant
14 upstream_gene_variant
15 upstream_gene_variant
16 upstream_gene_variant
17 upstream_gene_variant
18 upstream_gene_variant
19 upstream_gene_variant
20 upstream_gene_variant
21 upstream_gene_variant
22 upstream_gene_variant
23 intron_variant
我现在要按Gene
分组,按expr
降序排序,然后将数据帧向下过滤到每个Gene
组的expr
值底部10%的行(第10个百分位)。因此,我执行以下操作:
1)按表达式降序排序(成功)
p4p5_sort= p4p5_merge.sort_values(['expr', 'Gene'],
ascending=[False, True]).reset_index(drop=True)
2)按基因分组,筛选表达/基因的10%(失败)
p4p5_bottom10 = (p4p5_sort[p4p5_sort.groupby('Gene')['expr'].
apply(lambda x: x < x.quantile(0.1))])
第1步的工作原理应该是这样的,但当我运行第2步时,我只得到以下响应:
sys:1: DtypeWarning: Columns (15,16,22,36,37,38,39) have mixed types. Specify dtype option on import or set low_memory=False.
Empty DataFrame
Columns: [SampleID, expr, Gene, Period, tag, Consequence]
Index: []
如果有帮助的话,我要做的就是:
p4p5_bottom10 <- p4p5_merge %>% select(Gene, expr, SampleID, Period) %>%
group_by(Gene) %>%
arrange(Gene, desc(expr)) %>%
filter(expr < quantile(expr, 0.1))
您可以将分位数直接应用于grouby,如下所示:
p4p5_bottom10 = pd.DataFrame(p4p5_sort.groupby(['Gene'])['expr'].quantile(0.1))
我们必须申请pd.数据帧()转换为DF。你知道吗
相关问题 更多 >
编程相关推荐