使用xr.open_数据集更改变量数据(即数据点“43768”在读入xarray时返回“b'0”)

2024-09-30 14:23:26 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图读入一些模型数据,与观测数据进行比较。此数据具有与实际边界站名称相对应的站名称。但是,当我使用xr.open_dataset(file)将其读入python时,它将5位站名分隔为一个字符,该字符对应于站名的一部分。我注意到station_name变量的dtype='| S1',这意味着它一次只读取一个字符。也就是说,它将把41004号站分为b'4',b'1',b'0',b'0',b'4'。我想我需要一个函数来组合所有单独的字符串,从名称和撇号中删除b,并删除零字符条目。是否有将5个字符合并回原始电台名称并删除所有b和撇号的功能?我正在寻找的正确执行此操作的netCDF函数是stationID = netCDF4.chartostring(station[:])以下是python中的数据变量读数:

Data variables:
station_name
(station, string40)
|S1
b'3' b'2' b'0' b'1' ... b'' b'' b''
array([[b'3', b'2', b'0', ..., b'', b'', b''],
    [b'3', b'4', b'0', ..., b'', b'', b''],
    [b'4', b'1', b'0', ..., b'', b'', b''],
    ...,<br />
    [b'6', b'3', b'1', ..., b'', b'', b''],
    [b'6', b'4', b'0', ..., b'', b'', b''],
    [b'6', b'4', b'0', ..., b'', b'', b'']], dtype='|S1')

在linux中使用ncdump-v station\u name filename.nc:

{  
dimensions:  
        time = UNLIMITED ; // (8 currently)  
        station = 240 ;  
        string40 = 40 ;  
variables:  
        double time(time) ;  
                time:long_name = "julian day (UT)" ;  
                time:standard_name = "time" ;  
                time:units = "days since 1990-01-01 00:00:00" ;  
                time:conventions = "Relative julian days with decimal part (as parts of the day)"  
 ;
                time:axis = "T" ;  
                time:calendar = "standard" ;  
        int station(station) ;  
                station:long_name = "station id" ;  
                station:_FillValue = -2147483647 ;  
                station:axis = "X" ;  
        int string40(string40) ;  
                string40:long_name = "station_name number of characters" ;  
                string40:_FillValue = -2147483647 ;  
                string40:axis = "W" ;  
        char station_name(station, string40) ;  
                station_name:long_name = "station name" ;  
                station_name:content = "XW" ;  
                station_name:associates = "station string40" ;  
        float longitude(time, station) ;  
                longitude:long_name = "longitude" ;  
                longitude:standard_name = "longitude" ;  
                longitude:globwave_name = "longitude" ;  
                longitude:units = "degree_east" ;  
                longitude:scale_factor = 1.f ;  
                longitude:add_offset = 0.f ;  
                longitude:valid_min = -180.f ;  
                longitude:valid_max = 360.f ;  
                longitude:_FillValue = 9.96921e+36f ;  
                longitude:content = "TX" ;  
                longitude:associates = "time station" ;  
        float latitude(time, station) ;  
                latitude:long_name = "latitude" ;  
                latitude:standard_name = "latitude" ;  
                latitude:globwave_name = "latitude" ;  
                latitude:units = "degree_north" ;  
                latitude:scale_factor = 1.f ;  
                latitude:add_offset = 0.f ;  
                latitude:valid_min = -90.f ;  
                latitude:valid_max = 180.f ;  
                latitude:_FillValue = 9.96921e+36f ;  
                latitude:content = "TX" ;  
                latitude:associates = "time station" ;  
        float hs(time, station) ;  
                hs:long_name = "spectral estimate of significant wave height" ;  
                hs:standard_name = "sea_surface_wave_significant_height" ;  
                hs:globwave_name = "significant_wave_height" ;  
                hs:units = "m" ;  
                hs:scale_factor = 1.f ;  
                hs:add_offset = 0.f ;  
                hs:valid_min = 0.f ;  
                hs:valid_max = 100.f ;  
                hs:_FillValue = 9.96921e+36f ;  
                hs:content = "TX" ;  
                hs:associates = "time station" ;  

// global attributes:  
                :product_name = "ww3.202104_tab.nc" ;  
                :area = "GLOBAL 1 deg grid lat 85" ;  
                :data_type = "OCO spectra 2D" ;  
                :format_version = "1.1" ;  
                :southernmost_latitude = "n/a" ;  
                :northernmost_latitude = "n/a" ;  
                :latitude_resolution = "n/a" ;  
                :westernmost_longitude = "n/a" ;  
                :easternmost_longitude = "n/a" ;  
                :longitude_resolution = "n/a" ;  
                :minimum_altitude = "n/a" ;  
                :maximum_altitude = "n/a" ;  
                :altitude_resolution = "n/a" ;  
                :start_date = "2021-04-01 03:00:00" ;  
                :stop_date = "2021-04-02 00:00:00" ;  
                :field_type = "3-hourly" ;  
data:

station_name =  
  "32012",  
  "34002",  
  "41049",  
  "41051",  
  "41052",  
  "41060",  
...  
  "64045",  
  "64046" ;  
} 

我不得不将其全部标记为代码,但它是上面从xr_opendataset(文件)和ncdump输出的数据,只是为了避免混淆

我试着修剪它,使它不会太长,但我认为看到完整的ncdump会有帮助


Tags: 数据name名称time字符standardlonghs
2条回答

这似乎与解码有关

也许可以试试这个:https://www.tutorialspoint.com/python/string_decode.htm

还可以查看xr.open_dataset()的所有解码选项:http://xarray.pydata.org/en/stable/generated/xarray.open_dataset.html

电台名称保存为需要使用chartostring命令转换回可读字符串的字符。这样可以正确读取站点名称(alebeit使用NetCDF4而不是xarray读取netCDF文件:

import netCDF4 as nc

modelfile = filepath+file
model = nc.Dataset(modelfile)
strings = model.variables['station_name'][:]
stations = nc.chartostring(strings[:])

Output: array(['32012', '34002', '41001', ... , '63115', '63117',
       '64045', '64046'], dtype='<U40')

我目前不知道如何使用xarray执行此操作,因为修改此代码以使用xarry读取变量会产生错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-11-87a0f3653b9d> in <module>
      1 # model.variables['station'][:]
      2 strings = model['station_name'][:][:]
  > 3 stations = nc.chartostring(strings[:])
      4 stations
AttributeError: 'DataArray' object has no attribute 'tobytes'

但是,我上面的解决方案是用NetCDF4阅读它,这对我很有用

相关问题 更多 >