在一个sh上有许多表的excel数据框

2024-09-30 18:31:36 发布

您现在位置:Python中文网/ 问答频道 /正文

我一直在努力处理一张表上有多个表的糟糕的excel表格。我在中读到一个,数据框看起来是这样的:

CSV HERE

# (see csv file)

import pandas as pd
df = pd.read_csv('SO_csv.csv')
df

Out[]:
Unnamed: 0               Monday              Tuesday            Wednesday             Thursday               Friday             Saturday               Sunday     Total
0    0th Week  2019-01-07 00:00:00  2019-01-08 00:00:00  2019-01-09 00:00:00  2019-01-10 00:00:00  2019-01-11 00:00:00  2019-01-12 00:00:00  2019-01-13 00:00:00       NaN
1       item0             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
2       item1             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
3       item2             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
4       item3             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
5       Total              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012       NaN
6         NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN
7         NaN               Monday              Tuesday            Wednesday             Thursday               Friday             Saturday               Sunday       NaN
8    1st Week  2019-01-14 00:00:00  2019-01-15 00:00:00  2019-01-16 00:00:00  2019-01-17 00:00:00  2019-01-18 00:00:00  2019-01-19 00:00:00  2019-01-20 00:00:00       NaN
9       item0             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
10      item1             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
11      item2             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
12      item3             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
13      item4             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
14      item5             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
15      Total              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012       NaN
16        NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN
17        NaN               Monday              Tuesday            Wednesday             Thursday               Friday             Saturday               Sunday       NaN
18   2nd Week  2019-01-21 00:00:00  2019-01-22 00:00:00  2019-01-23 00:00:00  2019-01-24 00:00:00  2019-01-25 00:00:00  2019-01-26 00:00:00  2019-01-27 00:00:00       NaN
19      item0             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
20      item1             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
21      item2             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
22      item6             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
23      item3             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
24      item4             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
25      item5             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
26      Total              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012       NaN
27        NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN
28        NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN
29        NaN               Monday              Tuesday            Wednesday             Thursday               Friday             Saturday               Sunday       NaN
30   3rd Week           28/01/2019           29/01/2019           30/01/2019           31/01/2019  2019-01-02 00:00:00  2019-02-02 00:00:00  2019-03-02 00:00:00       NaN
31      item0             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
32      item1             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
33      item2             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
34      item6             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
35      item3             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
36      item4             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
37      item5             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017             0.510017  3.570116
38      Total              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012              3.57012       NaN
39        NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN                  NaN       NaN

我想创建一个“整洁”表,其中每一行都是该项目的每日支出(可以是NaN)

注意:这里的代码只是演示一个输出,而不是解释我尝试了什么。你知道吗

# THIS IS ALL JUST TO CREATE A DATAFRAME TO SHOW YOU WHAT IT SHOULD LOOK LIKE
items = ['item0', 'item1', 'item2', 'item3', 'item4', 'item5', 'item6', 'item7']
weeks = [1,2,3]
days = list(calendar.day_name)
arrays_len = len(weeks) * len(days) * len(items)

dates = df_.loc[weeks_bool,item_cols].melt()
dates['value'] = pd.to_datetime(dates['value'])
dates = dates.sort_values(by='value')
dates = list(dates['value'].astype(str)) # len 28

# length 168 arrays
items = np.tile(items, int(len_arrays / len(items) ))
weeks = np.repeat(weeks, int(len_arrays / len(weeks)))
days = np.repeat(days, int(len_arrays / len(days)))
vals = np.repeat(0.510017, int(len_arrays))
dates = np.repeat(dates, int(len_arrays / len(dates)))

pd.DataFrame({'weeks': weeks,'days': days, 'dates':dates,'items': items,'value': vals}).head()

Out[]:
    weeks       days       dates  items     value
0      1     Monday  2019-01-02  item0  0.510017
1      1    Tuesday  2019-01-02  item1  0.510017
2      1  Wednesday  2019-01-02  item2  0.510017
3      1   Thursday  2019-01-02  item3  0.510017
4      1     Friday  2019-01-02  item4  0.510017

有谁能帮我带些熊猫肉卷到那儿去吗?你知道吗

我目前的想法/尝试

  • 我找到了每个表的起始索引
  • 但我遇到的一个关键问题是如何使用这个索引来选择新表
  • 因为空格(all NAN)行的数量是可变的(1或2)
  • 而且项目的数量也是可变的

作为第一步,我想将True值分组到(但不包括)下一个真值。你知道吗

weeks_bool = df['Unnamed: 0'].str.contains('Week').fillna(False)

Tags: lenvalueitemsnandaysdatesitem0monday