我们使用pandas Dataframe作为时间序列数据的主要数据容器。我们将数据帧打包成二进制blob,放入mongoDB文档中进行存储,并将其与有关时间序列blob的元数据的键一起打包。在
当我们从pandas 0.14.1升级到0.15.2时遇到了一个错误。在
创建熊猫数据帧的二进制blob(0.14.1)
import lz4
import cPickle
bd = lz4.compress(cPickle.dumps(df,cPickle.HIGHEST_PROTOCOL))
错误案例:用pandas 0.15.2从mongoDB读回
^{pr2}$成功案例:用pandas 0.14.1从mongoDB读回,没有错误。在
这似乎类似于旧的堆栈线程Pandas compiled from source: default pickle behavior changed 来自https://stackoverflow.com/users/644898/jeff的有用注释
The error message you are seeing `TypeError: _reconstruct: First argument must be a sub-type of ndarray is that the python default unpickler makes sure that the class hierarchy that was pickled is exactly the same what it is recreating. Since Series has changed between versions this is no longer possible with the default unpickler, (this IMHO is a bug in the way pickle works). In any event, pandas will unpickle pre-0.13 pickles that have Series objects."
有什么解决办法或解决方案吗?在
要重新创建错误:
熊猫0.14.1环境中的设置:
df = pd.DataFrame(np.random.randn(10,10))
cPickle.dump(df,open("cp0141.p","wb"))
cPickle.load(open('cp0141.p','r')) # no error
在pandas 0.15.2环境中创建错误:
cPickle.load(open('cp0141.p','r'))
TypeError: ('_reconstruct: First argument must be a sub-type of ndarray', <built-in function_reconstruct>, (<class 'pandas.core.index.Int64Index'>, (0,), 'b'))
这被明确地称为
Index
类现在不再是子类ndarray
,而是一个pandas对象,参见here。在您只需使用
pd.read_pickle
来读取pickle。在相关问题 更多 >
编程相关推荐