计算公差范围内连续值的滚动计数

2024-09-30 18:32:58 发布

您现在位置:Python中文网/ 问答频道 /正文

假设我有一个连续几天访问网站的请求计数列表。我想计算当天的请求计数在某个公差范围内的天数(%当天计数的百分比)

合成示例:

>>> pd.DataFrame({'req': {0: 15, 1: 16, 2: 14, 3: 15, 4: 16, 5: 16, 6: 17, 7: 30, 8: 31, 9: 35, 10: 32, 11: 35, 12: 34, 13: 33, 14: 37}, 'lo': {0: 13.5, 1: 14.4, 2: 12.6, 3: 13.5, 4: 14.4, 5: 14.4, 6: 15.3, 7: 27.0, 8: 27.9, 9: 31.5, 10: 28.8, 11: 31.5, 12: 30.6, 13: 29.7, 14: 33.3}, 'hi': {0: 16.5, 1: 17.6, 2: 15.4, 3: 16.5, 4: 17.6, 5: 17.6, 6: 18.7, 7: 33.0, 8: 34.1, 9: 38.5, 10: 35.2, 11: 38.5, 12: 37.4, 13: 36.3, 14: 40.7}, 'con10': {0: 0, 1: 1, 2: 0, 3: 3, 4: 1, 5: 2, 6: 2, 7: 0, 8: 1, 9: 0, 10: 3, 11: 2, 12: 4, 13: 6, 14: 0}})
    req    lo    hi  con10
0    15  13.5  16.5      0
1    16  14.4  17.6      1
2    14  12.6  15.4      0
3    15  13.5  16.5      3
4    16  14.4  17.6      1
5    16  14.4  17.6      2
6    17  15.3  18.7      2
7    30  27.0  33.0      0
8    31  27.9  34.1      1
9    35  31.5  38.5      0
10   32  28.8  35.2      3
11   35  31.5  38.5      2
12   34  30.6  37.4      4
13   33  29.7  36.3      6
14   37  33.3  40.7      0

以上:

  • req是当天的请求数
  • lohi是当天和当天的耐受带
  • con10是在此日期之前的连续天数,其中请求数在给定的容差范围内(本例中为10%)

关于如何计算给定公差的con的任何指针(或更一般的公差列表,即分别为5/7/10%的con05con07con10


Tags: 示例lodataframe列表网站hiconreq
1条回答
网友
1楼 · 发布于 2024-09-30 18:32:58

更新:

以下是完成此操作的方法(以前几行为例):

def last_within_range(df, target_col='req', tolerance=10):
    
    df = df.copy()
    s = pd.Series(dtype=int, index=df.index)
    
    # Get low and high tolerance
    df['lo'] = df[target_col] - df[target_col] * tolerance/100
    df['hi'] = df[target_col] + df[target_col] * tolerance/100
    
    # Find how many last rows the current value from `req` is within the desired range  
    for idx in df.index[1:]:
        past_idx = df.index[:df.index.get_loc(idx)]
        req = df.loc[idx, 'req']
        # Get bool values and identify groups to get the last one
        values = (req >= df.loc[past_idx, 'lo']) & (req <= df.loc[past_idx, 'hi'])
        grps = (values != values.shift()).cumsum()
        # If the last group is True, then get its sum
        s[idx] = grps.eq(grps.iloc[-1]).sum() if values.iloc[-1] == True else 0
        
    return df.assign(**{f'con{tolerance}': s})

last_within_range(df, tolerance=10)

输出:

    req    lo    hi  con10
0    15  13.5  16.5      0
1    16  14.4  17.6      1
2    14  12.6  15.4      0
3    15  13.5  16.5      3
4    16  14.4  17.6      1
5    16  14.4  17.6      2
6    17  15.3  18.7      2
7    30  27.0  33.0      0
8    31  27.9  34.1      1
9    35  31.5  38.5      0
10   32  28.8  35.2      3
11   35  31.5  38.5      2
12   34  30.6  37.4      4
13   33  29.7  36.3      6
14   37  33.3  40.7      0

不过,它使用了一个循环:(


原始答案:

您可以使用函数获取可计算的lohi,然后在循环中通过索引使用它们。请查看以下函数:

def last_within_range(df, target_col='req', tolerance=10):
    
    df = df.copy()
    s = pd.Series(dtype=int, index=df.index)
    
    # Get low and high tolerance
    df['lo'] = df[target_col] - df[target_col] * tolerance/100
    df['hi'] = df[target_col] + df[target_col] * tolerance/100
    
    # Find how many last rows the current value from `req` is within the desired range  
    for idx in df.index:
        past_idx = df.index[:df.index.get_loc(idx)]
        req = df.loc[idx, 'req']
        s[idx] = (
            ((req >= df.loc[past_idx, 'lo']) & (req <= df.loc[past_idx, 'hi'])).sum()
        )
        
    return s # df.assign(**{f'con{tolerance}': s})

其中返回一个Series作为输出。例如:

df['con5'] = last_within_range(df, tolerance=5)
df['con7'] = last_within_range(df, tolerance=7)
df['con10'] = last_within_range(df, tolerance=10)
    req  con5  con7  con10
0    15     0     0      0
1    16     0     1      1
2    14     0     1      1
3    15     1     2      3
4    16     1     3      3
5    16     2     4      4
6    17     0     3      3
7    30     0     0      0
8    31     1     1      1
9    35     0     0      0
10   32     1     2      3
11   35     1     1      2
12   34     2     3      4
13   33     2     5      6
14   37     0     2      3

请注意,不会返回前几行中有多少行在预期范围内,但会返回前几行中有多少行符合您的条件

另外,如果希望在计算出的lohi旁边看到输出,可以通过将return语句替换为df.assign(**{f'con{tolerance}': s})而不是s来运行returna dataframe

相关问题 更多 >