用条件总结前面的行,python

2024-10-06 07:07:01 发布

您现在位置:Python中文网/ 问答频道 /正文

我是python新手。我有一个关于如何总结前几行的问题。数据集是:

df=pd.DataFrame({'ID':[1,1,1,1,2,2,2,2],'reason':['B','A','B','A','A','A','B','A'],'result':['W','W','Z','X','X','W','Z','W']})

        ID  reason  result
0       1      B      W
1       1      A      W
2       1      B      Z
3       1      A      X
4       2      A      X
5       2      A      W
6       2      B      Z
7       2      A      W

我想用相同的ID汇总reason的历史数据(前几行)。我还想用reason A汇总result的历史数据。结果应该如下所示:

    ID   reason result   Previous_reason  Previous_result_reasonA
0   1      B      W
1   1      A      W         B
2   1      B      Z         B|A              W
3   1      A      X         B|A|B            W
4   2      A      X
5   2      A      W         A                X
6   2      B      Z         A|A              X|W
7   2      A      W         A|A|B            X|W

先谢谢你。你知道吗


Tags: 数据iddataframedfresultpd汇总reason
1条回答
网友
1楼 · 发布于 2024-10-06 07:07:01

假设DataFrameID排序,您可以求解它O(n):

import pandas as pd

df = pd.DataFrame({'ID':[1,1,1,1,2,2,2,2],
                   'reason':['B','A','B','A','A','A','B','A'],
                   'result':['W','W','Z','X','X','W','Z','W']})

df['Previous_reason'] = [''] * len(df)
df['Previous_result_reasonA'] = [''] * len(df)

result_reasonA = ''

for r in range(1, len(df)):
    if df['ID'][r] == df['ID'][r-1]:
        df.loc[r, 'Previous_reason'] = \
            df['Previous_reason'][r-1] + '|' + df['reason'][r-1]
        df.loc[r, 'Previous_result_reasonA'] = \
            df['Previous_result_reasonA'][r-1]
        if result_reasonA:
            df.loc[r, 'Previous_result_reasonA'] += \
                '|' + result_reasonA
    else:
        df.loc[r, 'Previous_reason'] = ''

    if df['reason'][r] == 'A':
        result_reasonA = df['result'][r]
    else:
        result_reasonA = ''

# Clear trailing `|` separators
df['Previous_reason'] = \
    df['Previous_reason'].apply(lambda x: x[1:])
df['Previous_result_reasonA'] = \
    df['Previous_result_reasonA'].apply(lambda x: x[1:])

print df

输出:

   ID reason result Previous_reason Previous_result_reasonA
0   1      B      W                                        
1   1      A      W               B                        
2   1      B      Z             B|A                       W
3   1      A      X           B|A|B                       W
4   2      A      X                                        
5   2      A      W               A                       X
6   2      B      Z             A|A                     X|W
7   2      A      W           A|A|B                     X|W

然而,问题是,是否涵盖了所有特殊情况。这我不知道,因为我不知道数据的含义。你知道吗

相关问题 更多 >