如何在Pandas中使用Groupby在聚合后取消堆栈

df = [{'Store': 's1', 'Date': Timestamp('2020-08-01 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\ {'Store': 's1', 'Date': Timestamp('2020-08-02 00:00:00'), 'Employee': 'a', 'Department': 'd2', 'ID': 's1ad2', 'Level': 2, 'duties': 'C'},\ {'Store': 's1', 'Date': Timestamp('2020-08-03 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\ {'Store': 's1', 'Date': Timestamp('2020-08-04 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\ {'Store': 's1', 'Date': Timestamp('2020-08-05 00:00:00'), 'Employee': 'a', 'Department': 'd2', 'ID': 's1ad2', 'Level': 2, 'duties': 'C'},\ {'Store': 's2', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\ {'Store': 's2', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'a', 'Department': 'd3', 'ID': 's2ad3', 'Level': 2, 'duties': 'C'},\ {'Store': 's2', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\ {'Store': 's2', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\ {'Store': 's2', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'a', 'Department': 'd3', 'ID': 's3ad1', 'Level': 2, 'duties': 'C'},\ {'Store': 's1', 'Date': Timestamp('2020-08-01 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\ {'Store': 's1', 'Date': Timestamp('2020-08-02 00:00:00'), 'Employee': 'b', 'Department': 'd2', 'ID': 's1bd2', 'Level': 1, 'duties': 'C'},\ {'Store': 's1', 'Date': Timestamp('2020-08-03 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\ {'Store': 's1', 'Date': Timestamp('2020-08-04 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\ {'Store': 's1', 'Date': Timestamp('2020-08-05 00:00:00'), 'Employee': 'b', 'Department': 'd2', 'ID': 's1bd2', 'Level': 1, 'duties': 'C'},\ {'Store': 's2', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2ac1', 'Level': 3, 'duties': 'O'},\ {'Store': 's2', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'c', 'Department': 'd3', 'ID': 's2cd3', 'Level': 3, 'duties': 'C'},\ {'Store': 's2', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2cd1', 'Level': 3, 'duties': 'O'},\ {'Store': 's2', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2cd1', 'Level': 3, 'duties': 'O'},\ {'Store': 's2', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'c', 'Department': 'd3', 'ID': 's3cd1', 'Level': 3, 'duties': 'C'},\ {'Store': 's3', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'd', 'Department': 'd1', 'ID': 's3cd1', 'Level': 3, 'duties': 'O'},\ {'Store': 's3', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'd', 'Department': 'd3', 'ID': 's3dd3', 'Level': 3, 'duties': 'C'},\ {'Store': 's3', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'd', 'Department': 'd1', 'ID': 's3dd1', 'Level': 3, 'duties': 'O'},\ {'Store': 's3', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'd','Department': 'd1', 'ID': 's3dd1', 'Level': 3, 'duties': 'O'},\ {'Store': 's3', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'd', 'Department': 'd3', 'ID': 's3dd1', 'Level': 3, 'duties': 'C'}]

Store s1 s2 s3 Department d1 d2 d1 d3 d1 d3 Employee ID Level duties first shift last shift #ofshift first shift last shift #ofshift first shift last shift #ofshift first shift last shift #ofshift first shift last shift #ofshift first shift last shift #ofshift a s1ad1 2 O 2020-08-01 2020-08-04 3 a s1ad2 2 C 2020-08-02 2020-08-05 2 a s2ad1 2 O 2020-08-08 2020-08-11 3 a s2ad3 2 O 2020-08-09 2020-08-12 2 b s1bd1 1 O 2020-08-01 2020-08-04 3 b s1bd2 1 C 2020-08-02 2020-08-05 2 c s2cd1 3 O 2020-08-08 2020-08-11 3 c s3ad3 3 O 2020-08-09 2020-08-12 2 d s3dd1 3 O 2020-08-08 2020-08-11 3 d s3dd3 3 O 2020-08-09 2020-08-12 2

df = df.groupby(['Employee', 'Store', 'Department'])\ .agg({'Date':['first', 'last', 'size'], 'ID': 'first', 'Level': 'first', 'duties': 'first'}) # Join the Each Column with its operation. df.columns = df.columns.map('_'.join) # Reset the Index df = df.reset_index().set_index('Employee') # Renaming Columns of Dataframe. df.rename(columns={'Date_first':'First Shift', 'Date_last':'Last Shift', 'Date_size':'# of shift', 'ID_first':'ID', 'Level_first':'Level', 'duties_first':'duties'}, inplace=True)

Store Department First Shift Last Shift # of shift ID_first Level duties Employee a s1 d1 2020-08-01 2020-08-04 3 s1ad1 2 O a s1 d2 2020-08-02 2020-08-05 2 s1ad2 2 C a s2 d1 2020-08-08 2020-08-11 3 s2ad1 2 O a s2 d3 2020-08-09 2020-08-12 2 s2ad3 2 C b s1 d1 2020-08-01 2020-08-04 3 s1bd1 1 O b s1 d2 2020-08-02 2020-08-05 2 s1bd2 1 C c s2 d1 2020-08-08 2020-08-11 3 s2ac1 3 O c s2 d3 2020-08-09 2020-08-12 2 s2cd3 3 C d s3 d1 2020-08-08 2020-08-11 3 s3cd1 3 O d s3 d3 2020-08-09 2020-08-12 2 s3dd3 3 C

df = df.groupby(['Employee', 'Store', 'Department', 'First Shift', 'Last Shift', '# of shift', 'ID_first',\ 'Level', 'duties'])\ .size()\ .unstack(['Store', 'Department']).fillna(0)

Store s1 s2 s3 Department d1 d2 d1 d3 d1 d3 Employee First Shift Last Shift # of shift ID_first Level duties a 2020-08-01 2020-08-04 3 s1ad1 2 O 1.0 0.0 0.0 0.0 0.0 0.0 2020-08-02 2020-08-05 2 s1ad2 2 C 0.0 1.0 0.0 0.0 0.0 0.0 2020-08-08 2020-08-11 3 s2ad1 2 O 0.0 0.0 1.0 0.0 0.0 0.0 2020-08-09 2020-08-12 2 s2ad3 2 C 0.0 0.0 0.0 1.0 0.0 0.0 b 2020-08-01 2020-08-04 3 s1bd1 1 O 1.0 0.0 0.0 0.0 0.0 0.0 2020-08-02 2020-08-05 2 s1bd2 1 C 0.0 1.0 0.0 0.0 0.0 0.0 c 2020-08-08 2020-08-11 3 s2ac1 3 O 0.0 0.0 1.0 0.0 0.0 0.0 2020-08-09 2020-08-12 2 s2cd3 3 C 0.0 0.0 0.0 1.0 0.0 0.0 d 2020-08-08 2020-08-11 3 s3cd1 3 O 0.0 0.0 0.0 0.0 1.0 0.0 2020-08-09 2020-08-12 2 s3dd3 3 C 0.0 0.0 0.0 0.0 0.0 1.0

1条回答

网友

1楼 · 发布于 2024-05-19 05:52:43

问题的关键在于，在使用.unstack()之前需要重新构造数据，因为所需的格式是一个矩阵，其值是三个重复的列。因此，您需要将数据帧从宽改为长，并在一列Values中使用这三个值创建一个新列，在另一列中对它们进行分类Shift

# Step 1: Named Groupby Agregation naming columns ins specific format required for `pd.wide_to_long`. Must end with integer.
df = (df.groupby(['Employee', 'Store', 'Department'])
        .agg(Shift_1=('Date','first'),
             Shift_2=('Date','last'),
             Shift_3=('Date','size'),
             ID=('ID', 'first'),
             Level=('Level', 'first'),
             duties=('duties', 'first'))
        .reset_index())


# Step 2: In preparation for a matrix The data must be transformed so that the three columns that are values in the matrix must be in long format.
df = pd.wide_to_long(df, stubnames='Shift_', i='ID', j='Shift').reset_index().rename(columns={'Shift_':'Values'})

# Step 3: 1,2,3 were required integer suffixes for wide_to_long but now let's change to what we want the columns to be called.
df['Shift'] = df['Shift'].replace([1,2,3],['First Shift','Last Shift','# of shift'])

# Step 4: Create the matrix be setting index and unstacking to columns
df = (df.sort_values(['Employee', 'Store', 'Department']) #values must be sorted in order for how we want columns to appear in matrix format
        .set_index(['Employee', 'ID', 'Level', 'duties', 'Store', 'Department', 'Shift'])
        .unstack(['Store', 'Department', 'Shift']).fillna(0)) 

# Step 5: Cleanup of Multi-index into desred format
df.columns = df.columns.reorder_levels([1,2,3,0]).droplevel(3)
df = df.reset_index()
df
Out[1]: 
Store      Employee     ID Level duties                   s1  \
Department                                                d1   
Shift                                            First Shift   
0                 a  s1ad1     2      O  2020-08-01 00:00:00   
1                 a  s1ad2     2      C                    0   
2                 a  s2ad1     2      O                    0   
3                 a  s2ad3     2      C                    0   
4                 b  s1bd1     1      O  2020-08-01 00:00:00   
5                 b  s1bd2     1      C                    0   
6                 c  s2ac1     3      O                    0   
7                 c  s2cd3     3      C                    0   
8                 d  s3cd1     3      O                    0   
9                 d  s3dd3     3      C                    0   

Store                                                            \
Department                                                   d2   
Shift                Last Shift # of shift          First Shift   
0           2020-08-04 00:00:00          3                    0   
1                             0          0  2020-08-02 00:00:00   
2                             0          0                    0   
3                             0          0                    0   
4           2020-08-04 00:00:00          3                    0   
5                             0          0  2020-08-02 00:00:00   
6                             0          0                    0   
7                             0          0                    0   
8                             0          0                    0   
9                             0          0                    0   

Store                                       ...         s2  \
Department                                  ...         d1   
Shift                Last Shift # of shift  ... # of shift   
0                             0          0  ...          0   
1           2020-08-05 00:00:00          2  ...          0   
2                             0          0  ...          3   
3                             0          0  ...          0   
4                             0          0  ...          0   
5           2020-08-05 00:00:00          2  ...          0   
6                             0          0  ...          3   
7                             0          0  ...          0   
8                             0          0  ...          0   
9                             0          0  ...          0   

Store                                                            \
Department                   d3                                   
Shift               First Shift           Last Shift # of shift   
0                             0                    0          0   
1                             0                    0          0   
2                             0                    0          0   
3           2020-08-09 00:00:00  2020-08-12 00:00:00          2   
4                             0                    0          0   
5                             0                    0          0   
6                             0                    0          0   
7           2020-08-09 00:00:00  2020-08-12 00:00:00          2   
8                             0                    0          0   
9                             0                    0          0   

Store                        s3                                  \
Department                   d1                                   
Shift               First Shift           Last Shift # of shift   
0                             0                    0          0   
1                             0                    0          0   
2                             0                    0          0   
3                             0                    0          0   
4                             0                    0          0   
5                             0                    0          0   
6                             0                    0          0   
7                             0                    0          0   
8           2020-08-08 00:00:00  2020-08-11 00:00:00          3   
9                             0                    0          0   

Store                                                            
Department                   d3                                  
Shift               First Shift           Last Shift # of shift  
0                             0                    0          0  
1                             0                    0          0  
2                             0                    0          0  
3                             0                    0          0  
4                             0                    0          0  
5                             0                    0          0  
6                             0                    0          0  
7                             0                    0          0  
8                             0                    0          0  
9           2020-08-09 00:00:00  2020-08-12 00:00:00          2  

[10 rows x 22 columns]

相关问题更多 >

编程相关推荐

热门问题

热门文章