pandas-如何仅将数据帧的选定列保存到HDF5

2024-05-10 14:38:41 发布

男 | 程序猿一只，喜欢编程写python代码。

我正在读取一个csv示例文件并将其存储在.h5数据库中。.csv的结构如下：

User_ID;Longitude;Latitude;Year;Month;String
267261661;-3.86580025;40.32170825;2013;12;hello world
171255468;-3.83879575;40.05035005;2013;12;hello world
343588169;-3.70759531;40.4055946;2014;2;hello world
908779052;-3.8356385;40.1249459;2013;8;hello world
289540518;-3.6723114;40.3801642;2013;11;hello world
635876313;-3.8323166;40.3379393;2012;10;hello world
175160914;-3.53687933;40.35101274;2013;12;hello world 
155029860;-3.68555076;40.47688417;2013;11;hello world

我把它放在了一家.h5商店里，熊猫们在那里买了hdf，我只选择了几列传递给.h5：

import pandas as pd

df = pd.read_csv(filename + '.csv', sep=';')

df.to_hdf('test.h5','key1',format='table',data_columns=['User_ID','Year'])

在.h5文件中使用HDFStore和read-hdf存储的列中，我得到了不同的结果，特别是：

store = pd.HDFStore('test.h5')
>>> store
>>> <class 'pandas.io.pytables.HDFStore'>
File path: /test.h5
/key1            frame_table  (typ->appendable,nrows->8,ncols->6,indexers->[index],dc->[User_ID,Year])

这正是我所期望的（只有“用户ID”和“年份”列存储在数据库中），尽管ncols->；6意味着实际上所有列都存储在.h5文件中。

如果我尝试使用pd.read_hdf读取文件：

hdf = pd.read_hdf('test.h5','key1')

要钥匙：

hdf.keys()
>>> Index([u'User_ID', u'Longitude', u'Latitude', u'Year', u'Month', u'String'], dtype='object')

这不是我所期望的，因为原始的.csv文件的所有列仍然在.h5数据库中。如何在.h5中仅存储选定的列以减小数据库的大小？

谢谢你的帮助。

Tags：文件 csv test id 数据库 hello world read

1条回答

网友

1楼 · 发布于 2024-05-10 14:38:41

只需在写入文件时选择列。

cols_to_keep = ['User_ID', 'Year']
df[cols_to_keep].to_hdf(...)

pandas-如何仅将数据帧的选定列保存到HDF5

相关问题更多 >

编程相关推荐

热门问题

热门文章

pandas-如何仅将数据帧的选定列保存到HDF5

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >