使用h5py无法注册数据类型atom（无法插入重复键）创建大量数据集

dt = [ ('scale', 'f4'), ('haloid', 'i8'), ('scale_desc', 'f4'), ('haloid_desc', 'i8'), ('num_prog', 'i4'), ('pid', 'i8'), ('upid', 'i8'), ('pid_desc', 'i8'), ('phantom', 'i4'), ('mvir_sam', 'f4'), ('mvir', 'f4'), ('rvir', 'f4'), ('rs', 'f4'), ('vrms', 'f4'), ('mmp', 'i4'), ('scale_lastmm', 'f4'), ('vmax', 'f4'), ('x', 'f4'), ('y', 'f4'), ('z', 'f4'), ('vx', 'f4'), ('vy', 'f4'), ('vz', 'f4'), ('jx', 'f4'), ('jy', 'f4'), ('jz', 'f4'), ('spin', 'f4'), ('haloid_breadth_first', 'i8'), ('haloid_depth_first', 'i8'), ('haloid_tree_root', 'i8'), ('haloid_orig', 'i8'), ('snap_num', 'i4'), ('haloid_next_coprog_depthfirst', 'i8'), ('haloid_last_prog_depthfirst', 'i8'), ('haloid_last_mainleaf_depthfirst', 'i8'), ('rs_klypin', 'f4'), ('mvir_all', 'f4'), ('m200b', 'f4'), ('m200c', 'f4'), ('m500c', 'f4'), ('m2500c', 'f4'), ('xoff', 'f4'), ('voff', 'f4'), ('spin_bullock', 'f4'), ('b_to_a', 'f4'), ('c_to_a', 'f4'), ('axisA_x', 'f4'), ('axisA_y', 'f4'), ('axisA_z', 'f4'), ('b_to_a_500c', 'f4'), ('c_to_a_500c', 'f4'), ('axisA_x_500c', 'f4'), ('axisA_y_500c', 'f4'), ('axisA_z_500c', 'f4'), ('t_by_u', 'f4'), ('mass_pe_behroozi', 'f4'), ('mass_pe_diemer', 'f4') ] def read_in_trees(self): """Store each tree as an hdf5 dataset. """ with open(self.fname) as ascii_file: with h5py.File(self.hdf5_name,"r+") as f: tree_id = "" current_tree = [] for line in ascii_file: if(line[0]=='#'): #new tree arr = np.array(current_tree, dtype = dt) f[tree_id] = arr current_tree = [] tree_id = line[6:].strip('\n') else: #read in next tree element current_tree.append(tuple(line.split())) return

1条回答

网友

1楼 · 发布于 2024-09-24 22:20:26

你有错误堆栈吗？指示代码中错误产生的位置？在

您报告：error RuntimeError: Unable to register datatype atom (Can't insert duplicate key)

在/usr/lib/python3/dist packages/h5py/\u hl中/数据类型.py在

class Datatype(HLObject):
    # Represents an HDF5 named datatype stored in a file.
    # >>> MyGroup["name"] = numpy.dtype("f")
    def __init__(self, bind):
        """ Create a new Datatype object by binding to a low-level TypeID.

我在这里猜一猜。你的dt有57个术语。我怀疑每次向文件添加tree时，它都会将每个字段注册为一个新的datatype。在

^{pr2}$

57*10e7的70%接近2*32。如果Python/numpy使用int32作为数据类型id，那么您可能达到了这个限制。在

我们必须在h5py或numpy代码中挖掘更多信息，以找到谁发出了这个错误消息。在

通过向文件中添加一个数组：

f[tree_id] = arr

将每个数组放入一个新的Group中的数据集中。如果每个数据集都有一个数据类型，或者数组的每个字段都有一个数据类型，那么很容易就可以得到2*32个数据类型。在

另一方面，如果您可以将多个arr存储到一个组或数据集，则可以避免数千个数据类型的注册。我对h5py还不太熟悉，所以不能建议你怎么做。在

我想知道这个序列是否能为多个数据集重用数据类型：

dt1=np.dtype(dt)
gg= f.create_group('testgroup')
gg['xdtype']=dt1
# see h5py.Datatype doc
xdtype=gg['xdtype']
x=np.zeros((10,),dtype=xdtype)
gg['tree1']=x
x=np.ones((10,),dtype=xdtype)
gg['tree2']=x

在Datatype文档之后，我尝试注册一个命名的数据类型，并将其用于添加到组中的每个数据集。在

In [117]: isinstance(xdtype, h5py.Datatype)
Out[117]: True
In [118]: xdtype.id
Out[118]: <h5py.h5t.TypeCompoundID at 0xb46e0d4c>

因此，如果我正确地读取了def make_new_dset，那么这将绕过py_create调用。在

相关问题更多 >

编程相关推荐

热门问题

热门文章