如何存储Pandas数据帧列表以方便访问

df1 = Stock Year Profit CountPercent AAPL 2012 1 38.77 AAPL 2013 1 33.33 df2 = Stock Year Profit CountPercent GOOG 2012 1 43.47 GOOG 2013 1 32.35 df3 = Stock Year Profit CountPercent ABC 2012 1 40.00 ABC 2013 1 32.35

2条回答

网友

1楼 · 编辑于 2024-10-01 11:40:52

我认为如果所有的df都具有相同的形状，那么将数据存储为pandas.Panel而不是DFs列表，这就是pandas_datareader的工作原理

import io
import pandas as pd

df1 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
AAPL,2012,1,38.77
AAPL,2013,1,33.33
"""
))

df2 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
GOOG,2012,1,43.47
GOOG,2013,1,32.35
"""
))

df3 = pd.read_csv(io.StringIO("""
Stock,Year,Profit,CountPercent
ABC,2012,1,40.0
ABC,2013,1,32.35
"""
))


store = pd.HDFStore('c:/temp/stocks.h5')

# i had to drop `Stock` column and make it Panel-Axis, because of ERROR:
# TypeError: Cannot serialize the column [%s] because its data contents are [mixed-integer] object dtype
# when saving Panel to HDFStore ... 
p = pd.Panel({df.iat[0, 0]:df.drop('Stock', 1) for df in [df1,df2,df3]})

store = pd.HDFStore('c:/temp/stocks.h5')
store.append('stocks', p, data_columns=True, mode='w')
store.close()

# read panel from HDFStore
store = pd.HDFStore('c:/temp/stocks.h5')
p = store.select('stocks')

商店：

^{pr2}$

面板尺寸：

In [19]: p['AAPL']
Out[19]:
     Year  Profit  CountPercent
0  2012.0     1.0         38.77
1  2013.0     1.0         33.33

In [20]: p[:, :, 'Profit']
Out[20]:
   AAPL  ABC  GOOG
0   1.0  1.0   1.0
1   1.0  1.0   1.0

In [21]: p[:, 0]
Out[21]:
                 AAPL     ABC     GOOG
Year          2012.00  2012.0  2012.00
Profit           1.00     1.0     1.00
CountPercent    38.77    40.0    43.47

网友

2楼 · 编辑于 2024-10-01 11:40:52

如果列Stock中的值相同，则可以通过^{}删除此列并使用dict comprehension（键是每个df中列{}的第一个值）：

dfs = {df.ix[0,'Stock']: df.iloc[:, 1:] for df in [df1,df2,df3]}

print (dfs['AAPL'])
   Year  Profit  CountPercent
0  2012       1         38.77
1  2013       1         33.33

print (dfs['ABC'])
   Year  Profit  CountPercent
0  2012       1         40.00
1  2013       1         32.35

print (dfs['GOOG'])
   Year  Profit  CountPercent
0  2012       1         43.47
1  2013       1         32.35

对于存储在disk中，我认为最好使用hdf5 pytables。在

如果每个Stack列中的值相同，则可以^{}全部df并存储它：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章