基于匹配列将信息从一个数据帧合并到另一个数据帧

2024-05-18 06:33:10 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧(con),看起来像:

tag                         Consequence
HSB670|ENSG00000147996      upstream_gene_variant
HSB666|ENSG00000147996      upstream_gene_variant
HSB651|ENSG00000174749      downstream_gene_variant
HSB195|ENSG00000188157      splice_variant

第二个数据帧(period)如下所示:

Sample      expr        Gene                Period  tag
"HSB651"    3.207474    "ENSG00000174749"   4       HSB651|ENSG00000174749
"HSB670"    3.797228    "ENSG00000147996"   4       HSB670|ENSG00000147996
"HSB195"    0.214731    "ENSG00000188157"   4       HSB195|ENSG00000188157 
"HSB666"    3.663308    "ENSG00000147996"   5       HSB666|ENSG00000147996

我想把结果信息从con合并到period。它们有共同的tag列,所以基本上,只要标记相似,我就想找到相应的Consequence,并将其添加到period数据帧中。最后应该是这样的:

Sample      expr        Gene                Period  tag                     Consequence 
"HSB651"    3.207474    "ENSG00000174749"   4       HSB651|ENSG00000174749  downstream_gene_variant
"HSB670"    3.797228    "ENSG00000147996"   4       HSB670|ENSG00000147996  upstream_gene_variant
"HSB195"    0.214731    "ENSG00000188157"   4       HSB195|ENSG00000188157  splice_variant
"HSB666"    3.663308    "ENSG00000147996"   5       HSB666|ENSG00000147996  upstream_gene_variant

我试过,但结果很奇怪:

merge = pd.merge(period, con, on="tag", how="left")

结果:

   SampleID      expr             Gene  Period                     tag      Consequence  
0    HSB670  3.797228  ENSG00000147996       4  HSB670|ENSG00000147996      NaN  
1    HSB666  3.663308  ENSG00000147996       5  HSB666|ENSG00000147996      upstream_gene_variant   
2    HSB666  3.663308  ENSG00000147996       5  HSB666|ENSG00000147996      upstream_gene_variant   
3    HSB666  3.663308  ENSG00000147996       5  HSB666|ENSG00000147996      upstream_gene_variant   
4    HSB666  3.663308  ENSG00000147996       5  HSB666|ENSG00000147996      upstream_gene_variant   
5    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant       
6    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant   
7    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant   
8    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant   
9    HSB651  3.207474  ENSG00000174749       4  HSB651|ENSG00000174749      downstream_gene_variant   
10   HSB195  0.214731  ENSG00000188157       4  HSB195|ENSG00000188157      splice_variant   
11   HSB195  0.214731  ENSG00000188157       4  HSB195|ENSG00000188157      splice_variant   
12   HSB195  0.214731  ENSG00000188157       4  HSB195|ENSG00000188157      splice_variant 

Tags: tagperiodvariantgeneupstreamsplicedownstreamensg00000188157