numpy阈值分析中若干数据的矢量化处理

2024-10-04 03:17:50 发布

您现在位置:Python中文网/ 问答频道 /正文

基本上,我有一些网格气象数据的维度(时间,纬度,经度)。你知道吗

  1. 我需要检查每个网格的时间序列,确定连续的几天 (“事件”),当变量高于阈值时,将其存储到 新变量(THdays)
  2. 然后我查看新变量并找到比某个持续时间长的事件(THevents)

目前,我有一个超级好斗(非矢量化)迭代,我很感激你的建议,如何加快这一点。谢谢!你知道吗

import numpy as np
import itertools as it
##### Parameters
lg = 2000  # length to initialise array (must be long to store large number of events)
rl = 180  # e.g latitude
cl = 360  # longitude
pcts = [95, 97, 99] # percentiles which are the thresholds that will be compared
dt = [1,2,3] #duration thresholds, i.e. consecutive values (days) above threshold

##### Data
data   # this is the gridded data that is (time,lat,lon) , e.g. data = np.random.rand(1000,rl,cl)
# From this data calculate the percentiles at each gridsquare (lat/lon combination) which will act as our thresholds
histpcts = np.percentile(data, q=pcts, axis = 0)


##### Initialize arrays to store the results
THdays = np.ndarray((rl, cl, lg, len(pcts)), dtype='int16') #Array to store consecutive threshold timesteps
THevents = np.ndarray((rl,cl,lg,len(pcts),len(dt)),dtype='int16')

##### Start iteration to identify events
for p in range(len(pcts)):  # for each threshold value
    br = data>histpcts[p,:,:]  # Make boolean array where data is bigger than threshold

    # for every lat/lon combination
    for r,c in it.product(range(rl),range(cl)): 
        if br[:,r,c].any()==True: # This is to skip timeseries with nans only and so the iteration is skipped. Important to keep this or something that ignores an array of nans
            a = [ sum( 1 for _ in group ) for key, group in it.groupby( br[:,r,c] ) if key ] # Find the consecutive instances
            tm = np.full(lg-len(a), np.nan)   # create an array of nans to fill in the rest


            # Assign to new array
            THdays[r,c,0:len(a),p] = a  # Consecutive Thresholds days
            THdays[r,c,len(a):,p] = tm  # Fill the rest of array

            # Now cycle through and identify events 
            # (consecutive values) longer than a certain duration (dt)
            for d in range(len(dt)):
                b = THdays[r,c,THdays[r,c,:,p]>=dt[d],p]
                THevents[r,c,0:len(b),p,d] = b

Tags: thetoinfordataleniscl
1条回答
网友
1楼 · 发布于 2024-10-04 03:17:50

你试过numba吗? 当您在numpy中使用简单循环时,它将使您获得极大的加速。 你所需要做的就是把你的代码放在一个函数中,然后应用装饰器来装饰这个函数。这就是全部!!!你知道吗

@jit
def myfun(inputs):
    ## crazy nested loops here

当然,您提供的信息越多,您将获得更快的速度: 您可以在这里找到更多信息:http://numba.pydata.org/numba-doc/0.34.0/user/overview.html

相关问题 更多 >