我一直在努力处理一张表上有多个表的糟糕的excel表格。我在中读到一个,数据框看起来是这样的:
# (see csv file)
import pandas as pd
df = pd.read_csv('SO_csv.csv')
df
Out[]:
Unnamed: 0 Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total
0 0th Week 2019-01-07 00:00:00 2019-01-08 00:00:00 2019-01-09 00:00:00 2019-01-10 00:00:00 2019-01-11 00:00:00 2019-01-12 00:00:00 2019-01-13 00:00:00 NaN
1 item0 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
2 item1 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
3 item2 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
4 item3 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
5 Total 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 NaN
6 NaN NaN NaN NaN NaN NaN NaN NaN NaN
7 NaN Monday Tuesday Wednesday Thursday Friday Saturday Sunday NaN
8 1st Week 2019-01-14 00:00:00 2019-01-15 00:00:00 2019-01-16 00:00:00 2019-01-17 00:00:00 2019-01-18 00:00:00 2019-01-19 00:00:00 2019-01-20 00:00:00 NaN
9 item0 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
10 item1 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
11 item2 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
12 item3 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
13 item4 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
14 item5 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
15 Total 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 NaN
16 NaN NaN NaN NaN NaN NaN NaN NaN NaN
17 NaN Monday Tuesday Wednesday Thursday Friday Saturday Sunday NaN
18 2nd Week 2019-01-21 00:00:00 2019-01-22 00:00:00 2019-01-23 00:00:00 2019-01-24 00:00:00 2019-01-25 00:00:00 2019-01-26 00:00:00 2019-01-27 00:00:00 NaN
19 item0 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
20 item1 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
21 item2 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
22 item6 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
23 item3 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
24 item4 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
25 item5 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
26 Total 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 NaN
27 NaN NaN NaN NaN NaN NaN NaN NaN NaN
28 NaN NaN NaN NaN NaN NaN NaN NaN NaN
29 NaN Monday Tuesday Wednesday Thursday Friday Saturday Sunday NaN
30 3rd Week 28/01/2019 29/01/2019 30/01/2019 31/01/2019 2019-01-02 00:00:00 2019-02-02 00:00:00 2019-03-02 00:00:00 NaN
31 item0 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
32 item1 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
33 item2 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
34 item6 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
35 item3 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
36 item4 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
37 item5 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 0.510017 3.570116
38 Total 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 3.57012 NaN
39 NaN NaN NaN NaN NaN NaN NaN NaN NaN
注意:这里的代码只是演示一个输出,而不是解释我尝试了什么。你知道吗
# THIS IS ALL JUST TO CREATE A DATAFRAME TO SHOW YOU WHAT IT SHOULD LOOK LIKE
items = ['item0', 'item1', 'item2', 'item3', 'item4', 'item5', 'item6', 'item7']
weeks = [1,2,3]
days = list(calendar.day_name)
arrays_len = len(weeks) * len(days) * len(items)
dates = df_.loc[weeks_bool,item_cols].melt()
dates['value'] = pd.to_datetime(dates['value'])
dates = dates.sort_values(by='value')
dates = list(dates['value'].astype(str)) # len 28
# length 168 arrays
items = np.tile(items, int(len_arrays / len(items) ))
weeks = np.repeat(weeks, int(len_arrays / len(weeks)))
days = np.repeat(days, int(len_arrays / len(days)))
vals = np.repeat(0.510017, int(len_arrays))
dates = np.repeat(dates, int(len_arrays / len(dates)))
pd.DataFrame({'weeks': weeks,'days': days, 'dates':dates,'items': items,'value': vals}).head()
Out[]:
weeks days dates items value
0 1 Monday 2019-01-02 item0 0.510017
1 1 Tuesday 2019-01-02 item1 0.510017
2 1 Wednesday 2019-01-02 item2 0.510017
3 1 Thursday 2019-01-02 item3 0.510017
4 1 Friday 2019-01-02 item4 0.510017
有谁能帮我带些熊猫肉卷到那儿去吗?你知道吗
作为第一步,我想将True
值分组到(但不包括)下一个真值。你知道吗
weeks_bool = df['Unnamed: 0'].str.contains('Week').fillna(False)
目前没有回答
相关问题 更多 >
编程相关推荐