你好,数据科学家和熊猫专家
在应用groupby聚合方法后,我需要一些帮助来找出如何更好地组织数据。我已经尝试取消堆叠到新的数据帧,但它没有产生预期的结果
这是我的数据框:
df = [{'Store': 's1', 'Date': Timestamp('2020-08-01 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\
{'Store': 's1', 'Date': Timestamp('2020-08-02 00:00:00'), 'Employee': 'a', 'Department': 'd2', 'ID': 's1ad2', 'Level': 2, 'duties': 'C'},\
{'Store': 's1', 'Date': Timestamp('2020-08-03 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\
{'Store': 's1', 'Date': Timestamp('2020-08-04 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's1ad1', 'Level': 2, 'duties': 'O'},\
{'Store': 's1', 'Date': Timestamp('2020-08-05 00:00:00'), 'Employee': 'a', 'Department': 'd2', 'ID': 's1ad2', 'Level': 2, 'duties': 'C'},\
{'Store': 's2', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\
{'Store': 's2', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'a', 'Department': 'd3', 'ID': 's2ad3', 'Level': 2, 'duties': 'C'},\
{'Store': 's2', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\
{'Store': 's2', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'a', 'Department': 'd1', 'ID': 's2ad1', 'Level': 2, 'duties': 'O'},\
{'Store': 's2', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'a', 'Department': 'd3', 'ID': 's3ad1', 'Level': 2, 'duties': 'C'},\
{'Store': 's1', 'Date': Timestamp('2020-08-01 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\
{'Store': 's1', 'Date': Timestamp('2020-08-02 00:00:00'), 'Employee': 'b', 'Department': 'd2', 'ID': 's1bd2', 'Level': 1, 'duties': 'C'},\
{'Store': 's1', 'Date': Timestamp('2020-08-03 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\
{'Store': 's1', 'Date': Timestamp('2020-08-04 00:00:00'), 'Employee': 'b', 'Department': 'd1', 'ID': 's1bd1', 'Level': 1, 'duties': 'O'},\
{'Store': 's1', 'Date': Timestamp('2020-08-05 00:00:00'), 'Employee': 'b', 'Department': 'd2', 'ID': 's1bd2', 'Level': 1, 'duties': 'C'},\
{'Store': 's2', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2ac1', 'Level': 3, 'duties': 'O'},\
{'Store': 's2', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'c', 'Department': 'd3', 'ID': 's2cd3', 'Level': 3, 'duties': 'C'},\
{'Store': 's2', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2cd1', 'Level': 3, 'duties': 'O'},\
{'Store': 's2', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'c', 'Department': 'd1', 'ID': 's2cd1', 'Level': 3, 'duties': 'O'},\
{'Store': 's2', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'c', 'Department': 'd3', 'ID': 's3cd1', 'Level': 3, 'duties': 'C'},\
{'Store': 's3', 'Date': Timestamp('2020-08-08 00:00:00'), 'Employee': 'd', 'Department': 'd1', 'ID': 's3cd1', 'Level': 3, 'duties': 'O'},\
{'Store': 's3', 'Date': Timestamp('2020-08-09 00:00:00'), 'Employee': 'd', 'Department': 'd3', 'ID': 's3dd3', 'Level': 3, 'duties': 'C'},\
{'Store': 's3', 'Date': Timestamp('2020-08-10 00:00:00'), 'Employee': 'd', 'Department': 'd1', 'ID': 's3dd1', 'Level': 3, 'duties': 'O'},\
{'Store': 's3', 'Date': Timestamp('2020-08-11 00:00:00'), 'Employee': 'd','Department': 'd1', 'ID': 's3dd1', 'Level': 3, 'duties': 'O'},\
{'Store': 's3', 'Date': Timestamp('2020-08-12 00:00:00'), 'Employee': 'd', 'Department': 'd3', 'ID': 's3dd1', 'Level': 3, 'duties': 'C'}]
我想组织我的输出,使其在存储中能够很好地堆叠-->;部门-->;受雇者如下所示(很抱歉,输出没有很好地排列):
Store s1 s2 s3
Department d1 d2 d1 d3 d1 d3
Employee ID Level duties first shift last shift #ofshift first shift last shift #ofshift first shift last shift #ofshift first shift last shift #ofshift first shift last shift #ofshift first shift last shift #ofshift
a s1ad1 2 O 2020-08-01 2020-08-04 3
a s1ad2 2 C 2020-08-02 2020-08-05 2
a s2ad1 2 O 2020-08-08 2020-08-11 3
a s2ad3 2 O 2020-08-09 2020-08-12 2
b s1bd1 1 O 2020-08-01 2020-08-04 3
b s1bd2 1 C 2020-08-02 2020-08-05 2
c s2cd1 3 O 2020-08-08 2020-08-11 3
c s3ad3 3 O 2020-08-09 2020-08-12 2
d s3dd1 3 O 2020-08-08 2020-08-11 3
d s3dd3 3 O 2020-08-09 2020-08-12 2
因此,我试着按表达式分组如下:
df = df.groupby(['Employee', 'Store', 'Department'])\
.agg({'Date':['first', 'last', 'size'],
'ID': 'first',
'Level': 'first',
'duties': 'first'})
# Join the Each Column with its operation.
df.columns = df.columns.map('_'.join)
# Reset the Index
df = df.reset_index().set_index('Employee')
# Renaming Columns of Dataframe.
df.rename(columns={'Date_first':'First Shift',
'Date_last':'Last Shift',
'Date_size':'# of shift',
'ID_first':'ID',
'Level_first':'Level',
'duties_first':'duties'},
inplace=True)
这将打印以下结果:
Store Department First Shift Last Shift # of shift ID_first Level duties
Employee
a s1 d1 2020-08-01 2020-08-04 3 s1ad1 2 O
a s1 d2 2020-08-02 2020-08-05 2 s1ad2 2 C
a s2 d1 2020-08-08 2020-08-11 3 s2ad1 2 O
a s2 d3 2020-08-09 2020-08-12 2 s2ad3 2 C
b s1 d1 2020-08-01 2020-08-04 3 s1bd1 1 O
b s1 d2 2020-08-02 2020-08-05 2 s1bd2 1 C
c s2 d1 2020-08-08 2020-08-11 3 s2ac1 3 O
c s2 d3 2020-08-09 2020-08-12 2 s2cd3 3 C
d s3 d1 2020-08-08 2020-08-11 3 s3cd1 3 O
d s3 d3 2020-08-09 2020-08-12 2 s3dd3 3 C
然后我应用了unstack表达式,如下所示:
df = df.groupby(['Employee', 'Store', 'Department', 'First Shift', 'Last Shift', '# of shift', 'ID_first',\
'Level', 'duties'])\
.size()\
.unstack(['Store', 'Department']).fillna(0)
它将打印结果,如下所示:
Store s1 s2 s3
Department d1 d2 d1 d3 d1 d3
Employee First Shift Last Shift # of shift ID_first Level duties
a 2020-08-01 2020-08-04 3 s1ad1 2 O 1.0 0.0 0.0 0.0 0.0 0.0
2020-08-02 2020-08-05 2 s1ad2 2 C 0.0 1.0 0.0 0.0 0.0 0.0
2020-08-08 2020-08-11 3 s2ad1 2 O 0.0 0.0 1.0 0.0 0.0 0.0
2020-08-09 2020-08-12 2 s2ad3 2 C 0.0 0.0 0.0 1.0 0.0 0.0
b 2020-08-01 2020-08-04 3 s1bd1 1 O 1.0 0.0 0.0 0.0 0.0 0.0
2020-08-02 2020-08-05 2 s1bd2 1 C 0.0 1.0 0.0 0.0 0.0 0.0
c 2020-08-08 2020-08-11 3 s2ac1 3 O 0.0 0.0 1.0 0.0 0.0 0.0
2020-08-09 2020-08-12 2 s2cd3 3 C 0.0 0.0 0.0 1.0 0.0 0.0
d 2020-08-08 2020-08-11 3 s3cd1 3 O 0.0 0.0 0.0 0.0 1.0 0.0
2020-08-09 2020-08-12 2 s3dd3 3 C 0.0 0.0 0.0 0.0 0.0 1.0
我认为我错误地使用了大小和展开。然而,我似乎不知道如何重新组织数据
我非常感谢专家对如何正确组织我的数据的意见
再次感谢您的帮助和关心
多谢各位
问题的关键在于,在使用
.unstack()
之前需要重新构造数据,因为所需的格式是一个矩阵,其值是三个重复的列。因此,您需要将数据帧从宽改为长,并在一列Values
中使用这三个值创建一个新列,在另一列中对它们进行分类Shift
相关问题 更多 >
编程相关推荐