返回两个日期之间的数据

def getTimeseriesData(path,column_num,startDate,endDate): colNames = ['date'] dfs = [] allfiles = glob.glob(os.path.join(path, "*.csv")) for fname in allfiles: name = os.path.splitext(fname)[0] name = os.path.split(name)[1] colNames.append(name) df = pd.read_csv(fname, header=None, usecols=[0, column_num,4,5], parse_dates=[0], dayfirst=True, index_col=[0], names=['date', name+'_LAST',name+'_VOLUME',name+'_MKTCAP']) df = df.groupby(level=0).agg('mean') dfs.append(df) dfs = pd.concat(dfs, axis=1) return dfs[(dfs['date'] >= startDate) & (dfs['date'] <= endDate)] #<<--I think this is the problem

BBG.XLON.BTA.S_LAST BBG.XLON.BTA.S_VOLUME BBG.XLON.BTA.S_MKTCAP \ date 2001-01-02 572 26605510 37494.60 2001-01-03 560 24715470 36708.00 2001-01-04 613 52781855 40182.15 2001-01-05 630 56600152 41296.50 2001-01-08 633 41014402 41493.15 BBG.XLON.VOD.S_LAST BBG.XLON.VOD.S_VOLUME BBG.XLON.VOD.S_MKTCAP date 2001-01-02 NaN NaN NaN 2001-01-03 225.00 444328736 145216.0020 2001-01-04 239.00 488568000 154251.6643 2001-01-05 242.25 237936704 156349.2288 2001-01-08 227.75 658059776 146990.8642

3条回答

网友

1楼 · 编辑于 2024-07-08 14:39:04

如果您的索引是单调递增的日期序列，则可以简单得多：

显示所有行，但只显示前两列：

In [98]: df.iloc[:, [0,1]]
Out[98]:
            BBG.XLON.BTA.S_LAST  BBG.XLON.BTA.S_VOLUME
date
2001-01-02                  572               26605510
2001-01-03                  560               24715470
2001-01-04                  613               52781855
2001-01-05                  630               56600152
2001-01-08                  633               41014402

筛选行，显示前两列：

^{pr2}$

或者在你的情况下：

return dfs.loc[startDate:endDate]

网友

2楼 · 编辑于 2024-07-08 14:39:04

这里date是索引的名称，而不是列名：

更改：

return dfs[(dfs['date'] >= startDate) & (dfs['date'] <= endDate)]

进入：

^{pr2}$

网友

3楼 · 编辑于 2024-07-08 14:39:04

在Python中，“&；”是按位的“and”，and是逻辑上的“and”。在

最好在这里使用list comprehension。在

return [df for df in dfs if df['date'] >= startDate and df['date'] <= endDate]

将遍历dfs列表，检查每个元素的if条件，并返回一个包含满足这些条件的所有元素的新列表。在

相关问题更多 >

编程相关推荐

热门问题

热门文章