如何使用Python netCDF4增量保存多个变量?

2024-05-19 01:13:45 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图使用python netCDF4模块将多个变量(比如AB)写入单个netcdf文件

我的函数在每次循环迭代中为AB输出一个新的时间片,我试图在这些新片出现时将其保存到文件中,而不是在RAM中累积并一次性保存

以下是我目前的尝试:

import numpy as np
from netCDF4 import Dataset, date2num, num2date

fout=Dataset('test.nc', 'w')

x=np.arange(10)
y=np.arange(20)
xx,yy=np.meshgrid(x,y)

# create dimensions
fout.createDimension('x', len(x))
fout.createDimension('y', len(y))
fout.createDimension('t', None)

x_ax=fout.createVariable('x', np.float, ['x',])
x_ax[:]=x
y_ax=fout.createVariable('y', np.float, ['y',])
y_ax[:]=y
t_ax=fout.createVariable('t', np.float, ['t',])

# a big loop
for ii in range(10):  

    # some function that outputs a slice of A and a slice of B
    var_aii=(xx*ii + yy)[None, ...]
    var_bii=(xx + yy*ii)[None, ...]

    if 'var_A' not in fout.variables.keys():
        # if 1st time writing "var_A", create new variable
        var_A=fout.createVariable('var_A', np.float, ['t', 'y', 'x'])
        var_A[:]=var_aii
    else:
        # if variable already created, append to the end of 1st dimension
        var_A=fout.variables['var_A']
        var_A[:]=np.concatenate([var_A[:], var_aii])

    if 'var_B' not in fout.variables.keys():
        var_B=fout.createVariable('var_B', np.float, ['t', 'y', 'x'])
        var_B[:]=var_aii
    else:
        var_B=fout.variables['var_B']
        var_B[:]=np.concatenate([var_B[:], var_aii])

    print('ii=', ii, 'var_A.shape=', var_A.shape, 'var_B.shape=', var_B.shape)

fout.close()

以下是输出:

ii= 0 var_A.shape= (1, 20, 10) var_B.shape= (1, 20, 10)
ii= 1 var_A.shape= (3, 20, 10) var_B.shape= (3, 20, 10)
ii= 2 var_A.shape= (5, 20, 10) var_B.shape= (5, 20, 10)
ii= 3 var_A.shape= (7, 20, 10) var_B.shape= (7, 20, 10)
ii= 4 var_A.shape= (9, 20, 10) var_B.shape= (9, 20, 10)
ii= 5 var_A.shape= (11, 20, 10) var_B.shape= (11, 20, 10)
ii= 6 var_A.shape= (13, 20, 10) var_B.shape= (13, 20, 10)
ii= 7 var_A.shape= (15, 20, 10) var_B.shape= (15, 20, 10)
ii= 8 var_A.shape= (17, 20, 10) var_B.shape= (17, 20, 10)
ii= 9 var_A.shape= (19, 20, 10) var_B.shape= (19, 20, 10)

问题是时间t维度的增长是2步,而不是1步。我认为这是因为无限t维度会随着附加更多数据而自动扩展,因此在ii==1迭代中,在写入var_A之后,时间维度会增长到2长度,因此在附加var_B时,var_B在附加之前已经有了2长度

我没有使用ii作为索引来分配像var_A[ii]=var_aii这样的值,因为我觉得它很容易出错。如果循环中有一些条件continue跳过了一些ii,则会产生间隙

那么,在时间维度上增加多个变量的更稳健的方法是什么呢


Tags: ifvarnp时间variablesaxfloatii
1条回答
网友
1楼 · 发布于 2024-05-19 01:13:45

查询当前时间维度的长度以获取插入索引似乎还不够

我制定了一个粗略的解决方案,将数据附加到netcdf文件中的现有变量:

def appendTime(fout, newslice, newt, varid):
    '''Append data along time dimension

    Args:
        fout (netCDF4.Dataset): opened Dataset file obj to write into.
        newslice (ndarray): new time slice data to save.
        newt (1darray): new time values of <newslice>.
        varid (str): variable id.
    '''

    newt=np.atleast_1d(newt)

    if varid not in fout.variables.keys():
        #        -Create variable        -
        varout=fout.createVariable(varid, np.float, ('t','y','x'), zlib=True)
        varout[:]=newslice
    else:
        #        -Append variable        -
        varout=fout.variables[varid]
        timeax=fout.variables['t']
        tlen=len(timeax)
        t0=newt[0]
        tidx=np.where(timeax==t0)[0]
        if len(tidx)>0:
            # time point already exists
            tidx=tidx[0]
        else:
            # new time point added
            tidx=tlen
        timeax[tidx:tidx+len(newt)]=newt
        varout[tidx:tidx+len(newt)]=newslice

    return

现在,循环中的代码可能是:

for ii in range(10):
    # some function that outputs a slice of A and a slice of B
    var_aii=(xx*ii + yy)[None, ...]
    var_bii=(xx + yy*ii)[None, ...]
    appendTime(fout, var_aii, ii, 'var_A')
    appendTime(fout, var_bii, ii, 'var_B')
    var_A=fout.variables['var_A']
    var_B=fout.variables['var_B']
    print('ii=', ii, 'var_A.shape=', var_A.shape, 'var_B.shape=', var_B.shape)

相关问题 更多 >

    热门问题