擅长:python、mysql、java
<p>为了考虑速度,我用不同的方法进行了测试</p>
<pre><code>def method_1(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
d = Dataset(file, 'r')
data.append(d.variables['hrv'][indices])
d.close()
return data
def method_2(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
data.append(xarray.open_dataset(file, engine='h5netcdf').hrv.values[indices])
return data
def method_3(file_paths: List[Path], indices) -> List[np.array]:
data=[]
for file in file_paths:
data.append(xarray.open_mfdataset([file], engine='h5netcdf').hrv.data.vindex[indices].compute())
return data
</code></pre>
<pre><code>In [1]: len(file_paths)
Out[1]: 4813
</code></pre>
<p>结果是:</p>
<ul>
<li>方法_1(使用netcdf4库):101.9s</li>
<li>方法2(使用xarray和API值):591.4s</li>
<li>方法_3(使用xarray+dask):688.7s</li>
</ul>
<p>我猜xarray+dask在<code>.compute</code>步中花费了很多时间</p>