有条件滚动计数

import pandas as pd l1 =["1", "1", "1", "2", "2", "2", "2", "2"] l2 =[1, 2, 2, 2, 2, 2, 2, 3] l3 =[45, 25, 28, 70, 95, 98, 120, 80] cowmast = pd.DataFrame(list(zip(l1, l2, l3))) cowmast.columns =['Cow', 'Lact', 'DIM'] def rolling_count(val): if val == rolling_count.previous: rolling_count.count +=1 else: rolling_count.previous = val rolling_count.count = 1 return rolling_count.count rolling_count.count = 0 #static variable rolling_count.previous = None #static variable cowmast['xmast'] = cowmast['Cow'].apply(rolling_count) #new column in dataframe cowmast

def count_consecutive_items_n_cols(df, col_name_list, output_col): cum_sum_list = [ (df[col_name] != df[col_name].shift(1)).cumsum().tolist() for col_name in col_name_list ] df[output_col] = df.groupby( ["_".join(map(str, x)) for x in zip(*cum_sum_list)] ).cumcount() + 1 return df count_consecutive_items_n_cols(cowmast, ['Cow', 'Lact'], ['Lxmast'])

Cow Lact DIM xmast Lxmast 0 1 1 45 1 1 1 1 2 25 2 1 2 1 2 28 3 2 3 2 2 70 1 1 4 2 2 95 2 2 5 2 2 98 3 3 6 2 2 120 4 4 7 2 3 80 5 1

Cow Lact DIM xmast Lxmast Adjusted 0 1 1 45 1 1 1 1 1 2 25 2 1 1 2 1 2 28 3 2 1 3 2 2 70 1 1 1 4 2 2 95 2 2 2 5 2 2 98 3 3 2 6 2 2 120 4 4 3 7 2 3 80 5 1 1

1条回答

网友

1楼 · 发布于 2024-10-02 04:19:34

实际上，如果使用了referenced question中具有最高投票权的解决方案，那么设置xmast和Lxmast的代码可以大大简化

将数据帧cowmast重命名为df，可以按如下方式设置xmast：

df['xmast'] = df.groupby((df['Cow'] != df['Cow'].shift(1)).cumsum()).cumcount()+1

类似地，要设置Lxmast，可以使用：

df['Lxmast'] = (df.groupby([(df['Cow'] != df['Cow'].shift(1)).cumsum(), 
                            (df['Lact'] != df['Lact'].shift()).cumsum()])
                  .cumcount()+1
               )

数据输入

l1 =["1", "1", "1", "2", "2", "2", "2", "2"]
l2 =[1, 2, 2, 2, 2, 2, 2, 3]
l3 =[45, 25, 28, 70, 95, 98, 120, 80]
cowmast = pd.DataFrame(list(zip(l1, l2, l3))) 

cowmast.columns =['Cow', 'Lact', 'DIM']

df = cowmast

输出

print(df)

  Cow  Lact  DIM  xmast  Lxmast
0   1     1   45      1       1
1   1     2   25      2       1
2   1     2   28      3       2
3   2     2   70      1       1
4   2     2   95      2       2
5   2     2   98      3       3
6   2     2  120      4       4
7   2     3   80      5       1

现在，继续您的需求的最后一部分，在下面的粗体中突出显示：

What I would like to do is restart the count for each cow (cow) lactation (Lact) and only increment the count when the number of days (DIM) between rows is more than 7.

我们可以这样做：

为了使代码更具可读性，让我们为迄今为止的代码定义2个分组序列：

m_Cow = (df['Cow'] != df['Cow'].shift()).cumsum()
m_Lact = (df['Lact'] != df['Lact'].shift()).cumsum()

然后，我们可以重写代码，以更可读的格式设置Lxmast，如下所示：

df['Lxmast'] = df.groupby([m_Cow, m_Lact]).cumcount()+1

现在，转到这里的主要工作。假设我们为它创建另一个新列Adjusted：

df['Adjusted'] = (df.groupby([m_Cow, m_Lact])
                   ['DIM'].diff().abs().gt(7)
                   .groupby([m_Cow, m_Lact])
                   .cumsum()+1
                )

结果：

print(df)

  Cow  Lact  DIM  xmast  Lxmast  Adjusted
0   1     1   45      1       1         1
1   1     2   25      2       1         1
2   1     2   28      3       2         1
3   2     2   70      1       1         1
4   2     2   95      2       2         2
5   2     2   98      3       3         2
6   2     2  120      4       4         3
7   2     3   80      5       1         1

在这里，在df.groupby([m_Cow, m_Lact])之后，我们获取列DIM，并通过^{}检查每一行与前一行的差异，通过^{}获取绝对值，然后检查它是否为>；在代码片段['DIM'].diff().abs().gt(7)中，由^{}执行7。然后，我们再次按相同的分组.groupby([m_Cow, m_Lact])，因为第三个条件在前两个条件的分组范围内。最后一步，我们在第三个条件上使用^{}，因此只有当第三个条件为真时，我们才增加计数

以防万一，仅当DIM被>；7仅限（如70至78），不包括减少了>；7（不是从78到70），您可以删除上述代码中的.abs()部分：

df['Adjusted'] = (df.groupby([m_Cow, m_Lact])
                   ['DIM'].diff().gt(7)
                   .groupby([m_Cow, m_Lact])
                   .cumsum()+1
                )

编辑（可能的简化取决于您的数据序列）

由于示例数据中的主要分组键Cow和Lact已经在排序顺序中，因此有机会进一步简化代码

与来自referenced question的示例数据不同，其中：

   col count
0  B   1
1  B   2
2  A   1 # Value does not match previous row => reset counter to 1
3  A   2
4  A   3
5  B   1 # Value does not match previous row => reset counter to 1

这里，最后一行中的最后一个B与其他B分开，它要求将计数重置为1，而不是从前一个B中2个的最后一个count继续（变为3）。因此，分组需要将当前行与前一行进行比较，以获得正确的分组。否则，当我们使用.groupby()并且B的值在处理过程中分组在一起时，最后一个条目的count值可能无法正确重置为1

如果主分组键Cow和Lact的数据在数据构造期间已自然排序，或已按指令排序，例如：

df = df.sort_values(['Cow', 'Lact'])

然后，我们可以简化代码，如下所示：

（当数据已按[Cow，Lact排序时）：

df['xmast'] = df.groupby('Cow').cumcount()+1
df['Lxmast'] = df.groupby(['Cow', 'Lact']).cumcount()+1
               
df['Adjusted'] = (df.groupby(['Cow', 'Lact'])
                    ['DIM'].diff().abs().gt(7)
                    .groupby([df['Cow'], df['Lact']])
                    .cumsum()+1
                 )

3列xmast、Lxmast和Adjusted中的结果和输出值相同

编辑（可能的简化取决于您的数据序列）

相关问题更多 >

编程相关推荐

热门问题

热门文章