缺少日期值

2024-06-28 11:30:29 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个条目表,如下所示。表中缺少日期项,我想用月末缺少的日期值填充这些项。对于第1列,缺少日期项的添加行应填充相同的值,对于值列,我希望它们填充0

FirstName   MiddleName  LastName    Date    Value1  Value2  Value3
first1  middle1 last1   1/31/2020   51  80  19
first1  middle1 last1   2/29/2020   14  44  56
first1  middle1 last1   4/30/2020   57  96  40
first1  middle1 last1   6/30/2020   58  65  3
first1  middle1 last1   8/31/2020   1   34  4
first1  middle1 last1   10/31/2020  40  38  53
first1  middle1 last1   12/31/2020  93  65  41
first1  middle1 last1   2/28/2021   3   43  0
first1  middle1 last1   4/30/2021   46  61  52
first2  middle2 last2   1/31/2020   64  19  33
first2  middle2 last2   2/29/2020   28  71  16
first2  middle2 last2   4/30/2020   2   94  78
first2  middle2 last2   5/31/2020   78  99  87
first2  middle2 last2   6/30/2020   10  70  14
first2  middle2 last2   7/31/2020   30  30  59
first2  middle2 last2   8/31/2020   55  96  73
first2  middle2 last2   10/31/2020  22  43  23
first2  middle2 last2   11/30/2020  12  4   84
first2  middle2 last2   1/31/2021   59  93  1
first2  middle2 last2   2/28/2021   19  33  52
first2  middle2 last2   3/31/2021   46  12  97
first2  middle2 last2   4/30/2021   41  44  59
first2  middle2 last2   5/31/2021   67  84  96
first2  middle2 last2   6/30/2021   52  69  78
first3  middle3 last3   4/30/2020   5   63  30
first3  middle3 last3   5/31/2020   45  22  7
first3  middle3 last3   6/30/2020   76  2   33
first3  middle3 last3   8/31/2020   81  25  52
first3  middle3 last3   9/30/2020   55  3   32
first3  middle3 last3   11/30/2020  46  45  80
first3  middle3 last3   12/31/2020  17  81  74
first3  middle3 last3   1/31/2021   98  6   55

'''

预期产量

FirstName   MiddleName  LastName    Date    Value1  Value2  Value3
first1  middle1 last1   1/31/2020   51  80  19
first1  middle1 last1   2/29/2020   14  44  56
first1  middle1 last1   3/31/2020   0   0   0
first1  middle1 last1   4/30/2020   57  96  40
first1  middle1 last1   5/31/2020   0   0   0
first1  middle1 last1   6/30/2020   58  65  3
first1  middle1 last1   7/31/2020   0   0   0
first1  middle1 last1   8/31/2020   1   34  4
first1  middle1 last1   9/30/2020   0   0   0
first1  middle1 last1   10/31/2020  40  38  53
first1  middle1 last1   11/30/2020  0   0   0
first1  middle1 last1   12/31/2020  93  65  41
first1  middle1 last1   1/31/2021   0   0   0
first1  middle1 last1   2/28/2021   3   43  0
first1  middle1 last1   3/31/2021   0   0   0
first1  middle1 last1   4/30/2021   46  61  52
first2  middle2 last2   1/31/2020   64  19  33
first2  middle2 last2   2/29/2020   28  71  16
first2  middle2 last2   3/31/2020   0   0   0
first2  middle2 last2   4/30/2020   2   94  78
first2  middle2 last2   5/31/2020   78  99  87
first2  middle2 last2   6/30/2020   10  70  14
first2  middle2 last2   7/31/2020   30  30  59
first2  middle2 last2   8/31/2020   55  96  73
first2  middle2 last2   9/30/2020   0   0   0
first2  middle2 last2   10/31/2020  22  43  23
first2  middle2 last2   11/30/2020  12  4   84
first2  middle2 last2   12/31/2020  0   0   0
first2  middle2 last2   1/31/2021   59  93  1
first2  middle2 last2   2/28/2021   19  33  52
first2  middle2 last2   3/31/2021   46  12  97
first2  middle2 last2   4/30/2021   41  44  59
first2  middle2 last2   5/31/2021   67  84  96
first2  middle2 last2   6/30/2021   52  69  78
first3  middle3 last3   4/30/2020   5   63  30
first3  middle3 last3   5/31/2020   45  22  7
first3  middle3 last3   6/30/2020   76  2   33
first3  middle3 last3   7/31/2020   0   0   0
first3  middle3 last3   8/31/2020   81  25  52
first3  middle3 last3   9/30/2020   55  3   32
first3  middle3 last3   10/31/2020  0   0   0
first3  middle3 last3   11/30/2020  46  45  80
first3  middle3 last3   12/31/2020  17  81  74
first3  middle3 last3   1/31/2021   98  6   55

我曾尝试对数据帧使用重采样,但没有得到所需的输出。 df=df.set_index('Date')。重采样('M')。ffill()。reset_index()


Tags: datefirstnamevalue1lastnamevalue2middlenamefirst2first1
1条回答
网友
1楼 · 发布于 2024-06-28 11:30:29

编辑:有一个比下面第二个更好的解决方案:

你的逻辑很接近。您可以只在.resample之前包含.groupby(),在.asfreq()之后包含.groupby()

df['Date'] = pd.to_datetime(df['Date'])
cols = ['FirstName','MiddleName','LastName']
df = (df.set_index('Date')
        .groupby(cols)
        .resample('M')
        .asfreq(fill_value=0)
        .drop(cols, axis=1)
        .reset_index())
df
Out[128]: 
   FirstName MiddleName LastName       Date  Value1  Value2  Value3
0     first1    middle1    last1 2020-01-31      51      80      19
1     first1    middle1    last1 2020-02-29      14      44      56
2     first1    middle1    last1 2020-03-31       0       0       0
3     first1    middle1    last1 2020-04-30      57      96      40
4     first1    middle1    last1 2020-05-31       0       0       0
5     first1    middle1    last1 2020-06-30      58      65       3
6     first1    middle1    last1 2020-07-31       0       0       0
7     first1    middle1    last1 2020-08-31       1      34       4
8     first1    middle1    last1 2020-09-30       0       0       0
9     first1    middle1    last1 2020-10-31      40      38      53
10    first1    middle1    last1 2020-11-30       0       0       0
11    first1    middle1    last1 2020-12-31      93      65      41
12    first1    middle1    last1 2021-01-31       0       0       0
13    first1    middle1    last1 2021-02-28       3      43       0
14    first1    middle1    last1 2021-03-31       0       0       0
15    first1    middle1    last1 2021-04-30      46      61      52
16    first2    middle2    last2 2020-01-31      64      19      33
17    first2    middle2    last2 2020-02-29      28      71      16
18    first2    middle2    last2 2020-03-31       0       0       0
19    first2    middle2    last2 2020-04-30       2      94      78
20    first2    middle2    last2 2020-05-31      78      99      87
21    first2    middle2    last2 2020-06-30      10      70      14
22    first2    middle2    last2 2020-07-31      30      30      59
23    first2    middle2    last2 2020-08-31      55      96      73
24    first2    middle2    last2 2020-09-30       0       0       0
25    first2    middle2    last2 2020-10-31      22      43      23
26    first2    middle2    last2 2020-11-30      12       4      84
27    first2    middle2    last2 2020-12-31       0       0       0
28    first2    middle2    last2 2021-01-31      59      93       1
29    first2    middle2    last2 2021-02-28      19      33      52
30    first2    middle2    last2 2021-03-31      46      12      97
31    first2    middle2    last2 2021-04-30      41      44      59
32    first2    middle2    last2 2021-05-31      67      84      96
33    first2    middle2    last2 2021-06-30      52      69      78
34    first3    middle3    last3 2020-04-30       5      63      30
35    first3    middle3    last3 2020-05-31      45      22       7
36    first3    middle3    last3 2020-06-30      76       2      33
37    first3    middle3    last3 2020-07-31       0       0       0
38    first3    middle3    last3 2020-08-31      81      25      52
39    first3    middle3    last3 2020-09-30      55       3      32
40    first3    middle3    last3 2020-10-31       0       0       0
41    first3    middle3    last3 2020-11-30      46      45      80
42    first3    middle3    last3 2020-12-31      17      81      74
43    first3    middle3    last3 2021-01-31      98       6      55

方法2和reindex(更复杂)

它变得有点复杂,因为您需要在一个组中填充值。我不确定是否有比以下解决方案更简单的方法:

  1. 您可以使用freq='M'创建一个date_range(),其中将包含缺少的月末日期值。稍后,您将使用它重新编制索引
  2. 您正在组内重新编制索引,因此可以为名为FullName的组创建唯一标识符。重新编制索引时还将传递此新列,以便在组内重新编制索引
  3. 利用pd.MultiIndex.from_product()创建由上面的#1和#2组成的MultiIndex
  4. 使用set_index()并将FirstNameDate设置为索引。接下来,您将.reindex 传递我们在#3中创建的mmultiindex的数据帧
  5. 清理数据帧,以便使用.groupby.transform(max)用正确的值填充FirstNameMiddleNameLastName
  6. 最后,清理列的顺序,不包括我先前创建的临时列FullName

您需要使用pd.MultiIndex.from_product()FirstName重新索引,并使用名为s的日期的date_range。这是它的关键,它将允许您set_indexreindex来填充缺少的日期

import pandas as pd
df = pd.read_clipboard()
df['Date'] = pd.to_datetime(df['Date'])
s = pd.date_range(df['Date'].min(), df['Date'].max(), freq='M')
df['FullName'] = df['FirstName'] + df['MiddleName'] + df['LastName']
idx = pd.MultiIndex.from_product([df['FullName'].unique(), s], names=['FullName', 'Date'])
df = df.set_index(['FullName', 'Date']).reindex(idx, fill_value='0').reset_index()
df['FirstName'] = df.groupby('FullName')['FirstName'].transform('max')
df['MiddleName'] = df.groupby('FullName')['MiddleName'].transform('max')
df['LastName'] = df.groupby('FullName')['LastName'].transform('max')
df = df[['FirstName', 'MiddleName', 'LastName', 'Date', 'Value1', 'Value2',
       'Value3']].replace('0',0)
df
Out[98]: 
   FirstName MiddleName LastName       Date  Value1  Value2  Value3
0     first1    middle1    last1 2020-01-31      51      80      19
1     first1    middle1    last1 2020-02-29      14      44      56
2     first1    middle1    last1 2020-03-31       0       0       0
3     first1    middle1    last1 2020-04-30      57      96      40
4     first1    middle1    last1 2020-05-31       0       0       0
5     first1    middle1    last1 2020-06-30      58      65       3
6     first1    middle1    last1 2020-07-31       0       0       0
7     first1    middle1    last1 2020-08-31       1      34       4
8     first1    middle1    last1 2020-09-30       0       0       0
9     first1    middle1    last1 2020-10-31      40      38      53
10    first1    middle1    last1 2020-11-30       0       0       0
11    first1    middle1    last1 2020-12-31      93      65      41
12    first1    middle1    last1 2021-01-31       0       0       0
13    first1    middle1    last1 2021-02-28       3      43       0
14    first1    middle1    last1 2021-03-31       0       0       0
15    first1    middle1    last1 2021-04-30      46      61      52
16    first1    middle1    last1 2021-05-31       0       0       0
17    first1    middle1    last1 2021-06-30       0       0       0
18    first2    middle2    last2 2020-01-31      64      19      33
19    first2    middle2    last2 2020-02-29      28      71      16
20    first2    middle2    last2 2020-03-31       0       0       0
21    first2    middle2    last2 2020-04-30       2      94      78
22    first2    middle2    last2 2020-05-31      78      99      87
23    first2    middle2    last2 2020-06-30      10      70      14
24    first2    middle2    last2 2020-07-31      30      30      59
25    first2    middle2    last2 2020-08-31      55      96      73
26    first2    middle2    last2 2020-09-30       0       0       0
27    first2    middle2    last2 2020-10-31      22      43      23
28    first2    middle2    last2 2020-11-30      12       4      84
29    first2    middle2    last2 2020-12-31       0       0       0
30    first2    middle2    last2 2021-01-31      59      93       1
31    first2    middle2    last2 2021-02-28      19      33      52
32    first2    middle2    last2 2021-03-31      46      12      97
33    first2    middle2    last2 2021-04-30      41      44      59
34    first2    middle2    last2 2021-05-31      67      84      96
35    first2    middle2    last2 2021-06-30      52      69      78
36    first3    middle3    last3 2020-01-31       0       0       0
37    first3    middle3    last3 2020-02-29       0       0       0
38    first3    middle3    last3 2020-03-31       0       0       0
39    first3    middle3    last3 2020-04-30       5      63      30
40    first3    middle3    last3 2020-05-31      45      22       7
41    first3    middle3    last3 2020-06-30      76       2      33
42    first3    middle3    last3 2020-07-31       0       0       0
43    first3    middle3    last3 2020-08-31      81      25      52
44    first3    middle3    last3 2020-09-30      55       3      32
45    first3    middle3    last3 2020-10-31       0       0       0
46    first3    middle3    last3 2020-11-30      46      45      80
47    first3    middle3    last3 2020-12-31      17      81      74
48    first3    middle3    last3 2021-01-31      98       6      55
49    first3    middle3    last3 2021-02-28       0       0       0
50    first3    middle3    last3 2021-03-31       0       0       0
51    first3    middle3    last3 2021-04-30       0       0       0
52    first3    middle3    last3 2021-05-31       0       0       0
53    first3    middle3    last3 2021-06-30       0       0       0

相关问题 更多 >