netCDF到*.csv,不带循环(!)

2024-09-30 10:28:06 发布

您现在位置:Python中文网/ 问答频道 /正文

我遇到了一些性能和“难看的代码”问题,也许你们中的一些人可以帮忙。 我必须将数据从netCDF文件导出到csv。为此,我编写了一些python代码。让我们取一个3维的netcdf文件:

def to3dim_csv():
  var = ncf.variables['H2O'] #e.g. data for 'H2O' values
  one,two,three = var.shape #variable dimension shape e.g. (551,42,94)
  dim1,dim2,dim3 = var.dimensions #dimensions e.g. (time,lat,lon)

  if crit is not None:
    bool1 = foo(dim1,crit,ncf) #boolean table: ("value important?",TRUE,FALSE)
    bool2 = foo(dim2,crit,ncf)
    bool3 = foo(dim3,crit,ncf)

  writer.writerow([dim1,dim2,dim3,varn])
  for i in range(one):
    for k in range(two):
      for l in range(three):
        if bool1[i] and bool2[k] and bool3[l]:
          writer.writerow([
                        ncf.variables[dim1][i],
                        ncf.variables[dim2][k],
                        ncf.variables[dim3][l],
                        var[i,k,l],
                        ])
  ofile.close()

  # Sample csv output is like:
  # time,lat,lon,H2O
  # 1,90,10,100
  # 1,90,11,90
  # 1,91,10,101

我想删除for val in range(d):块。可能使用递归函数,例如:

^{pr2}$

更新: 对任何有兴趣的人。这是即时的。。。在

def data_to_table(dataset, var):
    assert isinstance(dataset,xr.Dataset), 'Dataset must be xarray.Dataset'
    obj = getattr(dataset, var)
    table = np.zeros((obj.data.size, obj.data.ndim+1), dtype=np.object_)
    table[:,0] = obj.data.flat
    for i,d in enumerate(obj.dims):
        repeat = np.prod(obj.data.shape[i+1:])
        tile = np.prod(obj.data.shape[:i])
        dim = getattr(dataset, d)
        dimdata = dim.data
        dimdata = np.repeat(dimdata, repeat)
        dimdata = np.tile(dimdata, tile)
        table[:,i+1] = dimdata.flat
    return table

def export_to_csv(dataset, var, filename, size=None):
    obj = getattr(dataset, var)
    header = [var] + [x for x in obj.dims]
    tabular = data_to_table(dataset, var)
    size = slice(None,size,None) if size else slice(None,None,None)
    with open(filename, 'w') as f:
        writer = csv.writer(f,dialect=csv.excel)
        writer.writerow(header)
        writer.writerows(tabular[size])

Tags: csvinnoneobjfordatasizevar
2条回答

像这样。获取bol1\2\3的索引,并在获取相关值的同时合并它们。在

    with open('numpy.csv', 'wb') as f:
        out_csv = csv.writer(f)
        header = ['dim1','dim2','dim3','varn']
        out_csv.writerow(header)
        bol1_indices = np.nonzero(bol1)[0]
        bol2_indices = np.nonzero(bol2)[0]
        bol3_indices = np.nonzero(bol3)[0]
        out_csv.writerows(([a[i, k, l], dim1[i], dim2[k], dim3[l]] for i in bol1_indices for k in bol2_indices for  l in bol3_indices))

在python中执行此操作总是很慢的,因为原始数据的格式与您要保存的格式不同。Python必须创建索引,并且每行保存一个值。你需要csv做什么?我建议使用ncdump,它可以很快地转换为一个简单的文本文件。如果必须使用csv,那么可以从FAN language utilities使用nc2text实用程序(参见this page)。在

相关问题 更多 >

    热门问题