从dask块中按名称选择维度

2024-10-03 15:35:56 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一些grib格式的集成文件,我想使用dask和xarray在Python中延迟加载它们。在https://climate-cms.org/2018/09/14/dask-era-interim.html中,我成功地按预期延迟加载了文件,但现在我想切片并选择维度,以便在一段时间和一段时间内绘制数据

我的程序看起来像:

import dask
import dask.array as da
import xarray as xr
import pandas as pd
import numpy as np
from glob import glob
from datetime import date, datetime, timedelta

import matplotlib.pyplot as plt

bpath = '/some/path/to/my/data'

# pressure levels
levels =['1000', '925', '850', '700', '500', '300', '250', '200', '50']

# ensemble member names
ensm = ['M01', 'M02', 'M03', 'M04', 'M05'] 

@dask.delayed
def open_file_delayed(file, vname):
    ds = xr.open_dataset(file, decode_cf='False', engine='pynio')
    return ds

def open_file(file, vname, nlevs, nlats, nlons, ftype):
    file_data = open_file_delayed(file, vname)[vname].data
    return da.from_delayed(file_data, (nlevs, nlats, nlons), ftype)

    # list of files to open (sorted by date)
    # filename mask: PREFIXMEMYYYYiMMiDDiHHiYYYYfMMfDDfHHf.grb
    # MEM: member name (see the levels list)
    # YYYYiMMiDDiHHi: analysis date (passed as an argument to the open_file function)
    # YYYYfMMfDDfHHf: forecast date
    files = sorted(glob(bpath + '/%(dateanl)s/%(mem)s/PREFIX%(mem)s%(dateanl)s*.grb'%
                        {'dateanl': date, 'mem': member}))
   
    # open the first file in the list to get dimensions and coordinates
    ds0 = xr.open_dataset(files[0], decode_cf='False', engine='pynio')
   
    var0 = ds0[vname]
   
    levs = ds0.lv_ISBL2.data
    lats = ds0.g4_lat_0.data
    lons = ds0.g4_lon_1.data
    
    nlevs = ds0.lv_ISBL2.size
    nlats = ds0.g4_lat_0.size
    nlons = ds0.g4_lon_1.size
    ftype = var0.dtype
   
    ds0.close()
    
    # calculate the date range of the forecasts, in my case len(date_fmt) = 61 (every grib file has 61 times and 9 levels)
    date_fmt = pd.date_range(start=datetime.strptime(date, "%Y%m%d%H"), freq="6H", periods=61)
        
    # call the function 'open_file' for all files contained in the list 'files' and concatenate
    dask_var = da.concatenate([open_file(file, vname, nlevs, nlats, nlons, ftype) for file in files], axis=0)
        
    # xda is the data array with all files
    xda = xr.DataArray(dask_var, dims=['tlev', 'lat', 'lon']) 
            
    # set coordinates values
    # note: tlev is a chunk of 9 levels * 61 forecast dates
    xda.coords['tlev'] = ('tlev', np.tile(levels, len(date_fmt)))
    xda.coords['lat'] = ('lat', lats)
    xda.coords['lon'] = ('lon', lons)
    
    return xda

要使用这段代码,我需要(对于单个分析日期-202005300-2020年5月30日,以及一个名为ZGEO的变量):

lens_zgeo = [open_ensemble('2020053000', ens, 'ZGEO') for ens in ensm]
dens_zgeo = xr.concat(lens_zgeo, dim='ens')
dens_zgeo.coords['ens'] = ('ens', ensm)

dens_zgeo是一个数据数组,其中tlev的大小为549,即9个级别*61个预测时间

我可以通过执行以下操作按ens维度对dens_zgeo进行分组:

tmp = dict(list(dens_zgeo.groupby('ens')))

然后我可以选择一个集合成员和tlev中的第一个坐标:

tmp['NMC'].isel(tlev=0).plot()

这给了我一个很好的横向/纵向图。在这一点上,我不清楚我选择了什么级别或时间,这就提出了一个问题:

我如何使用“sel”而不是“isel”按名称选择时间和级别

就我而言,这样做是很自然的:

tmp['NMC'].sel(lev='925', time='2020-05-30').plot()

但我似乎不能这样做,因为tlev是根据levelsdate_fmttlev = np.tile(levels, len(date_fmt)))来定义的


Tags: theimportdatadateasfilesopendask