如何在不删除值的情况下按日期范围重新索引pandas数据帧

2024-06-25 07:25:39 发布

您现在位置:Python中文网/ 问答频道 /正文

背景:

我用pyodbc下载了以下数据帧,日期为1999年至2015年:

CEISales.head(10)
Out[194]: 
   Order_DateC   RegionC     SalesC
0  2014-01-30  Domestic    3530.00
1  2011-10-11  Domestic     136.00
2  1999-01-13  Domestic      30.00
3  1999-01-13  Domestic   55615.00
4  1999-01-13  Domestic     440.00
5  1999-01-13  Domestic      94.00
6  1999-01-05  Domestic     612.00
7  1999-01-14  Domestic    1067.00
8  1999-01-14  Domestic   26345.05
9  1999-01-15  Domestic  161858.72

然后,我过滤了所有大于2010-01-01日期的数据,并按升序日期排序:

^{pr2}$

然后,我用pandas的date_range函数创建了一个日期索引,其值介于2010-01-01和今天之间:

date_index = pd.date_range(start='2010-01-01', end='2015-12-23' , freq='d')

并重新索引数据帧

CEIFinal= CEITest.reindex(date_index)

我的问题是,当我重新索引数据帧时,所有数据都被删除:

CEIFinal.head(5)
Out[206]: 
            Order_DateC RegionC  SalesC
2010-01-01         NaT     NaN     NaN
2010-01-02         NaT     NaN     NaN
2010-01-03         NaT     NaN     NaN
2010-01-04         NaT     NaN     NaN
2010-01-05         NaT     NaN     NaN

从原始过滤数据框中可以看到,2010-04-01上有交易

CEITest[CEITest['Order_DateC'] == '2010-01-04']
Out[210]: 
      Order_DateC   RegionC   SalesC
18156  2010-01-04   Foreign    450.0
18155  2010-01-04  Domestic   1990.4
18154  2010-01-04  Domestic  37477.0
18152  2010-01-04  Domestic      0.0
18153  2010-01-04  Domestic    783.0

问题 如何使用此日期范围重新索引此数据帧并保留所有原始值?我正在尝试在来自不同数据库的几个不同的数据帧上创建一个公共索引,以将其添加到一个聚合的数据帧中。你的帮助将不胜感激。谢谢!在


Tags: 数据dateindexorderrangenanoutnat
2条回答

我认为您需要在重新编制索引之前从列Order_DateC设置索引:

CEITest = CEITest.set_index('Order_DateC')

最后,您可以通过^{}和{a2}检查notnull值:

^{pr2}$

总而言之:

print CEISales
  Order_DateC   RegionC     SalesC
0  2014-01-30  Domestic    3530.00
1  2011-10-11  Domestic     136.00
2  1999-01-13  Domestic      30.00
3  1999-01-13  Domestic   55615.00
4  1999-01-13  Domestic     440.00
5  1999-01-13  Domestic      94.00
6  1999-01-05  Domestic     612.00
7  1999-01-14  Domestic    1067.00
8  1999-01-14  Domestic   26345.05
9  1999-01-15  Domestic  161858.72

CEIFilter = CEISales[CEISales['Order_DateC'] > '2010-01-01']
CEITest = CEIFilter.sort_values('Order_DateC')
print CEITest
  Order_DateC   RegionC  SalesC
1  2011-10-11  Domestic     136
0  2014-01-30  Domestic    3530

#set index to datetimeindex
CEITest = CEITest.set_index('Order_DateC')
print CEITest
              RegionC  SalesC
Order_DateC                  
2011-10-11   Domestic     136
2014-01-30   Domestic    3530

date_index = pd.date_range(start='2010-01-01', end='2015-12-23' , freq='d')
CEIFinal= CEITest.reindex(date_index)

print CEIFinal.head()
           RegionC  SalesC
2010-01-01     NaN     NaN
2010-01-02     NaN     NaN
2010-01-03     NaN     NaN
2010-01-04     NaN     NaN
2010-01-05     NaN     NaN

可以有很多NatNaN,检查数据:

print CEIFinal[CEIFinal.notnull().any(axis=1)]
             RegionC  SalesC
2011-10-11  Domestic     136
2014-01-30  Domestic    3530

最后,您可以设置索引名,^{}index-column name是索引名:

CEIFinal.index.name = 'CEIFinal'
CEIFinal = CEIFinal.reset_index()
print CEIFinal.head()
   CEIFinal RegionC  SalesC
0 2010-01-01     NaN     NaN
1 2010-01-02     NaN     NaN
2 2010-01-03     NaN     NaN
3 2010-01-04     NaN     NaN
4 2010-01-05     NaN     NaN

当索引不是DatetimeIndex时,您正在按DatetimeIndex编制索引:

      Order_DateC   RegionC   SalesC
18156  2010-01-04   Foreign    450.0
18155  2010-01-04  Domestic   1990.4
18154  2010-01-04  Domestic  37477.0
18152  2010-01-04  Domestic      0.0
18153  2010-01-04  Domestic    783.0

因此出现了NaNs和NaTs。在

也许您想将Order_DateC作为索引:

^{pr2}$

然后到resample。在

如果重新编制索引,将丢失具有重复日期的行。在

相关问题 更多 >