如何跨行获取特定字符串的计数?

2024-10-04 05:33:48 发布

您现在位置:Python中文网/ 问答频道 /正文

我的数据框架如下: 我想将数据帧的新列中的D、T和N的计数作为Dcount TCount Ncount

data = {'CHROM':['chr1', 'chr2', 'chr1', 'chr3', 'chr1','chr1', 'chr2', 'chr1'],
        'POS':[939570,3411794,1043223,22511093,24454031,3411794,22511093,1043223],
        'MI':['T', 'T', 'D', 'D', 'T', 'N', 'D', 'N'],
        'CSK':['D', 'D', 'N', 'T', 'N', 'D', 'T', 'T'],
        'DD':['N', 'D', 'D', 'D', 'T', 'N', 'D', 'N'],
        'RR':['D', 'T', 'N', 'T', 'D', 'D', 'T', 'N'],
        'RCB':['D', 'D', 'D', 'D', 'D', 'D', 'D', 'D'],
        'DC':['D', 'D', 'T', 'D', 'D', 'D', 'N', 'D']
       }
df1 = pd.DataFrame(data)

df1

    CHROM   POS      MI CSK DD  RR  RCB DC
0   chr1    939570   T  D   N   D   D   D
1   chr2    3411794  T  D   D   T   D   D
2   chr1    1043223  D  N   D   N   D   T
3   chr3    22511093 D  T   D   T   D   D
4   chr1    24454031 T  N   T   D   D   D
5   chr1    3411794  N  D   N   D   D   D
6   chr2    22511093 D  T   D   T   D   N
7   chr1    1043223  N  T   N   N   D   D

我想在一个新的数据帧中获取TDN的计数

预期产出:

    CHROM   POS      MI CSK DD  RR  RCB DC  Dcount  Tcount  Ncount
0   chr1    939570   T  D   N   D   D   D   4       1       1
1   chr2    3411794  T  D   D   T   D   D   4       2       0
2   chr1    1043223  D  N   D   N   D   T   3       1       2
3   chr3    22511093 D  T   D   T   D   D   4       2       0
4   chr1    24454031 T  N   T   D   D   D   3       2       1
5   chr1    3411794  N  D   N   D   D   D   4       0       2
6   chr2    22511093 D  T   D   T   D   N   3       2       1
7   chr1    1043223  N  T   N   N   D   D   2       1       3

Tags: 数据posrrdcdd计数michr1
1条回答
网友
1楼 · 发布于 2024-10-04 05:33:48

使用^{}选择数据帧从2到结尾的所有列,按^{}计算计数值,将缺少的值重新计算到0,然后使用^{}并按^{}追加到原始值:

df1 = (df1.join(df1.iloc[:, 2:]
                   .apply(pd.value_counts, axis=1)
                   .fillna(0)
                   .astype(int)
                   .add_suffix('count')))
print (df1)
  CHROM       POS MI CSK DD RR RCB DC  Dcount  Ncount  Tcount
0  chr1    939570  T   D  N  D   D  D       4       1       1
1  chr2   3411794  T   D  D  T   D  D       4       0       2
2  chr1   1043223  D   N  D  N   D  T       3       2       1
3  chr3  22511093  D   T  D  T   D  D       4       0       2
4  chr1  24454031  T   N  T  D   D  D       3       1       2
5  chr1   3411794  N   D  N  D   D  D       4       2       0
6  chr2  22511093  D   T  D  T   D  N       3       1       2
7  chr1   1043223  N   T  N  N   D  D       2       3       1

或者将^{}^{}^{}一起使用:

df1 = df1.join(df1.iloc[:, 2:]
                  .stack()
                  .groupby(level=0)
                  .value_counts()
                  .unstack(fill_value=0)
                  .add_suffix('count'))
print (df1)
  CHROM       POS MI CSK DD RR RCB DC  Dcount  Ncount  Tcount
0  chr1    939570  T   D  N  D   D  D       4       1       1
1  chr2   3411794  T   D  D  T   D  D       4       0       2
2  chr1   1043223  D   N  D  N   D  T       3       2       1
3  chr3  22511093  D   T  D  T   D  D       4       0       2
4  chr1  24454031  T   N  T  D   D  D       3       1       2
5  chr1   3411794  N   D  N  D   D  D       4       2       0
6  chr2  22511093  D   T  D  T   D  N       3       1       2
7  chr1   1043223  N   T  N  N   D  D       2       3       1

相关问题 更多 >