按结果显示分组,但对所有项目用0填充不存在的结果

2024-06-02 12:07:06 发布

您现在位置:Python中文网/ 问答频道 /正文

数据帧:

df = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)
print(df)
  
   Client            Result
0       A           Covered
1       A   Customer Reject
2       A  Customer Timeout
3       B     Dealer Reject
4       B    Dealer Timeout
5       B              Done
6       B      Tied Covered
7       B         Tied Done
8       B  Tied Traded Away
9       B       Traded Away
10      C            No RFQ
11      D           Covered
12      D   Customer Reject
13      D  Customer Timeout
14      D     Dealer Reject
15      D    Dealer Timeout
16      D              Done
17      D      Tied Covered
18      D         Tied Done
19      D  Tied Traded Away
20      D       Traded Away
21      D            No RFQ

电流输出:

df = df.groupby(['Client','Result']).agg({'Result': 'size'})
print(df)


                     Result
Client Result                  
A      Covered                1
       Customer Reject        1
       Customer Timeout       1
B      Dealer Reject          1
       Dealer Timeout         1
       Done                   1
       Tied Covered           1
       Tied Done              1
       Tied Traded Away       1
       Traded Away            1
C      No RFQ                 1
D      Covered                1
       Customer Reject        1
       Customer Timeout       1
       Dealer Reject          1
       Dealer Timeout         1
       Done                   1
       No RFQ                 1
       Tied Covered           1
       Tied Done              1
       Tied Traded Away       1
       Traded Away            1

期望输出:

a。分组间距

b。每个分组的总数

c。每个客户端的所有不存在的Result都有一个零

注意Result中可能的总值如下(11个字符串)。这些数据可能存在于当月的当前数据集中,也可能不存在于当前数据集中:

Covered
Customer Reject
Customer Timeout
Dealer Reject
Dealer Timeout
Done
Tied Covered
Tied Done
Tied Traded Away
Traded Away
No RFQ

Client Result              Count
A      Covered             1
A      Customer Reject     1
A      Customer Timeout    1
A      Dealer Reject       0
A      Dealer Timeout      0
A      Done                0
A      Tied Covered        0
A      Tied Done           0
A      Tied Traded Away    0
A      Traded Away         0
A      No RFQ              0
Total                      3

Client Result              Count        
B      Covered             0
B      Customer Reject     0
B      Customer Timeout    0
B      Dealer Reject       1
B      Dealer Timeout      1
B      Done                1
B      Tied Covered        1
B      Tied Done           1
B      Tied Traded Away    1
B      Traded Away         1
A      No RFQ              1
Total                      8            
        
Client Result              Count        
C      Covered             0
C      Customer Reject     0
C      Customer Timeout    0
C      Dealer Reject       0
C      Dealer Timeout      0
C      Done                0
C      Tied Covered        0
C      Tied Done           0
C      Tied Traded Away    0
C      Traded Away         0
C      No RFQ              1
Total                      1                        

Client Result              Count        
D      Covered             1
D      Customer Reject     1
D      Customer Timeout    1
D      Dealer Reject       1
D      Dealer Timeout      1
D      Done                1
D      Tied Covered        1
D      Tied Done           1
D      Tied Traded Away    1
D      Traded Away         1
D      No RFQ              1
Total                      11                       

Tags: 数据noclientdftimeoutcustomerresultdone
3条回答

您可以使用pandas.get_dummies后跟.groupby

df = df.set_index('Client')
df = pd.get_dummies(df, prefix='', prefix_sep='').groupby(df.index).apply(np.sum).T

print(df)

印刷品:

Client            A  B  C  D
Covered           1  0  0  1
Customer Reject   1  0  0  1
Customer Timeout  1  0  0  1
Dealer Reject     0  1  0  1
Dealer Timeout    0  1  0  1
Done              0  1  0  1
No RFQ            0  0  1  1
Tied Covered      0  1  0  1
Tied Done         0  1  0  1
Tied Traded Away  0  1  0  1
Traded Away       0  1  0  1

要打印数据,请执行以下操作:

def print_group(data, client):
    df = pd.DataFrame({'Result': data.index, 'Count': data.values.tolist()})
    df['Client'] = client
    df = df.set_index('Client')
    print(df[['Result', 'Count']])
    print('Total {}'.format(np.sum(df['Count'])))
    print()


df = df.set_index('Client')
df = pd.get_dummies(df, prefix='', prefix_sep='').groupby(df.index).apply(np.sum).T
    
for c in df.columns:
    print_group(df[c], c)

印刷品:

                  Result  Count
Client                         
A                Covered      1
A        Customer Reject      1
A       Customer Timeout      1
A          Dealer Reject      0
A         Dealer Timeout      0
A                   Done      0
A                 No RFQ      0
A           Tied Covered      0
A              Tied Done      0
A       Tied Traded Away      0
A            Traded Away      0
Total 3

                  Result  Count
Client                         
B                Covered      0
B        Customer Reject      0
B       Customer Timeout      0
B          Dealer Reject      1
B         Dealer Timeout      1
B                   Done      1
B                 No RFQ      0
B           Tied Covered      1
B              Tied Done      1
B       Tied Traded Away      1
B            Traded Away      1
Total 7

                  Result  Count
Client                         
C                Covered      0
C        Customer Reject      0
C       Customer Timeout      0
C          Dealer Reject      0
C         Dealer Timeout      0
C                   Done      0
C                 No RFQ      1
C           Tied Covered      0
C              Tied Done      0
C       Tied Traded Away      0
C            Traded Away      0
Total 1

                  Result  Count
Client                         
D                Covered      1
D        Customer Reject      1
D       Customer Timeout      1
D          Dealer Reject      1
D         Dealer Timeout      1
D                   Done      1
D                 No RFQ      1
D           Tied Covered      1
D              Tied Done      1
D       Tied Traded Away      1
D            Traded Away      1
Total 11

要创建小计,我使用与Transform pandas groupby result with subtotals to relative values中相同的方法

可能还有另一种方法可以为原始数据中不存在的条目创建索引,但是pivot_table可以做到这一点

data = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}

df = pd.DataFrame.from_dict(data)
# create a pivot_table and fill the empty sites with "0", the unstack again
# and group as required.
pvt = df.pivot_table(index='Client', columns = 'Result', aggfunc='size').fillna(0).unstack().groupby(by=['Client', 'Result']).agg(sum).reset_index()
print(pvt)

# calculate the subtotals
iList = ["Client", "Result"]
pvt2 = pd.concat(
    [pvt.assign(**{x: "" for x in iList[i:]}).groupby(iList).sum() for i in range(1, 3)]
).sort_index().reset_index()

# pretty-print the result
for c in pvt['Client'].drop_duplicates():
    print(pvt2.loc[pvt2['Client']==c].iloc[1:])
    print(f"Total: {pvt2.loc[pvt2['Client']==c].iloc[0].values[2]:>10}")
    print()

产生

   Client            Result    0
1       A           Covered  1.0
2       A   Customer Reject  1.0
3       A  Customer Timeout  1.0
4       A     Dealer Reject  0.0
5       A    Dealer Timeout  0.0
6       A              Done  0.0
7       A            No RFQ  0.0
8       A      Tied Covered  0.0
9       A         Tied Done  0.0
10      A  Tied Traded Away  0.0
11      A       Traded Away  0.0
Total:        3.0

   Client            Result    0
13      B           Covered  0.0
14      B   Customer Reject  0.0
15      B  Customer Timeout  0.0
16      B     Dealer Reject  1.0
17      B    Dealer Timeout  1.0
18      B              Done  1.0
19      B            No RFQ  0.0
20      B      Tied Covered  1.0
21      B         Tied Done  1.0
22      B  Tied Traded Away  1.0
23      B       Traded Away  1.0
Total:        7.0

   Client            Result    0
25      C           Covered  0.0
26      C   Customer Reject  0.0
27      C  Customer Timeout  0.0
28      C     Dealer Reject  0.0
29      C    Dealer Timeout  0.0
30      C              Done  0.0
31      C            No RFQ  1.0
32      C      Tied Covered  0.0
33      C         Tied Done  0.0
34      C  Tied Traded Away  0.0
35      C       Traded Away  0.0
Total:        1.0

   Client            Result    0
37      D           Covered  1.0
38      D   Customer Reject  1.0
39      D  Customer Timeout  1.0
40      D     Dealer Reject  1.0
41      D    Dealer Timeout  1.0
42      D              Done  1.0
43      D            No RFQ  1.0
44      D      Tied Covered  1.0
45      D         Tied Done  1.0
46      D  Tied Traded Away  1.0
47      D       Traded Away  1.0
Total:       11.0

我正在通过打印添加空行。在数据帧中添加空记录似乎不是一个好主意,因为它们在重新排序时会消失

类似地,将总计行的记录更改为“总计”也会有问题,因为一旦数据帧被重新排序,您就不知道它们属于哪个组

稍微修改了代码,但没有pivot_表:

import pandas as pd

data = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}

df = pd.DataFrame.from_dict(data)

cols = ["Client", "Result"]
ind1 =['A', 'B', 'C', 'D']
ind2 = ["Covered",
"Customer Reject",
"Customer Timeout",
"Dealer Reject",
"Dealer Timeout",
"Done",
"Tied Covered",
"Tied Done",
"Tied Traded Away",
"Traded Away",
"No RFQ"]

idx = pd.MultiIndex.from_product([ind1, ind2], names=cols)
pvt = df.groupby(cols).size().reindex(idx).fillna(0).reset_index()

pvt2 = pd.concat(
    [pvt.assign(**{x: "" for x in cols[i:]}).groupby(cols).sum() for i in range(1, 3)]
).sort_index().reset_index()

# print(pvt2)

for c in pvt['Client'].drop_duplicates():
    print(pvt2.loc[pvt2['Client']==c].iloc[1:])
    print(f"Total: {pvt2.loc[pvt2['Client']==c].iloc[0].values[2]:>10}")
    print()

.reindexpd.MultiIndex.from_product一起使用:

设置:

df = {'Client': ['A', 'A', 'A', 'B', 'B', 'B', 'B','B','B','B','C','D','D','D','D','D','D','D','D','D','D','D' ],
'Result': ['Covered', 'Customer Reject', 'Customer Timeout', 'Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ','Covered','Customer Reject','Customer Timeout','Dealer Reject','Dealer Timeout','Done','Tied Covered','Tied Done','Tied Traded Away','Traded Away','No RFQ']}
df = pd.DataFrame.from_dict(df)

代码:

cols = ['Client', 'Result']
i = [df['Client'].unique(), df['Result'].unique()]
df = df.groupby(cols)['Result'].size().rename('Count').reset_index().set_index(cols)
df = df.reindex(index=pd.MultiIndex.from_product(i, names=cols), fill_value=0).reset_index()
df
Out[1]: 
   Client            Result  Count
0       A           Covered      1
1       A   Customer Reject      1
2       A  Customer Timeout      1
3       A     Dealer Reject      0
4       A    Dealer Timeout      0
5       A              Done      0
6       A      Tied Covered      0
7       A         Tied Done      0
8       A  Tied Traded Away      0
9       A       Traded Away      0
10      A            No RFQ      0
11      B           Covered      0
12      B   Customer Reject      0
13      B  Customer Timeout      0
14      B     Dealer Reject      1
15      B    Dealer Timeout      1
16      B              Done      1
17      B      Tied Covered      1
18      B         Tied Done      1
19      B  Tied Traded Away      1
20      B       Traded Away      1
21      B            No RFQ      0
22      C           Covered      0
23      C   Customer Reject      0
24      C  Customer Timeout      0
25      C     Dealer Reject      0
26      C    Dealer Timeout      0
27      C              Done      0
28      C      Tied Covered      0
29      C         Tied Done      0
30      C  Tied Traded Away      0
31      C       Traded Away      0
32      C            No RFQ      1
33      D           Covered      1
34      D   Customer Reject      1
35      D  Customer Timeout      1
36      D     Dealer Reject      1
37      D    Dealer Timeout      1
38      D              Done      1
39      D      Tied Covered      1
40      D         Tied Done      1
41      D  Tied Traded Away      1
42      D       Traded Away      1
43      D            No RFQ      1

要显示结果,请执行以下操作:

for c in df['Client'].unique():
    x = df[df['Client'] == c]['Count'].sum()
    print(df[df['Client'] == c])
    print(f'Total {x}')
    print('')

      Client            Result  Count
0       A           Covered      1
1       A   Customer Reject      1
2       A  Customer Timeout      1
3       A     Dealer Reject      0
4       A    Dealer Timeout      0
5       A              Done      0
6       A      Tied Covered      0
7       A         Tied Done      0
8       A  Tied Traded Away      0
9       A       Traded Away      0
10      A            No RFQ      0
Total 3

   Client            Result  Count
11      B           Covered      0
12      B   Customer Reject      0
13      B  Customer Timeout      0
14      B     Dealer Reject      1
15      B    Dealer Timeout      1
16      B              Done      1
17      B      Tied Covered      1
18      B         Tied Done      1
19      B  Tied Traded Away      1
20      B       Traded Away      1
21      B            No RFQ      0
Total 7

   Client            Result  Count
22      C           Covered      0
23      C   Customer Reject      0
24      C  Customer Timeout      0
25      C     Dealer Reject      0
26      C    Dealer Timeout      0
27      C              Done      0
28      C      Tied Covered      0
29      C         Tied Done      0
30      C  Tied Traded Away      0
31      C       Traded Away      0
32      C            No RFQ      1
Total 1

   Client            Result  Count
33      D           Covered      1
34      D   Customer Reject      1
35      D  Customer Timeout      1
36      D     Dealer Reject      1
37      D    Dealer Timeout      1
38      D              Done      1
39      D      Tied Covered      1
40      D         Tied Done      1
41      D  Tied Traded Away      1
42      D       Traded Away      1
43      D            No RFQ      1
Total 11

相关问题 更多 >