将列转换为行以创建事件日志数据集

2024-09-25 12:33:00 发布

您现在位置:Python中文网/ 问答频道 /正文

您能帮我将列转换成行以创建事件日志系列吗

我想用以下列创建一个事件日志数据集

我的表格如下所示:

ID1     ID2      Event1             Event1_activity     Event2              Event2_activity     Event3              Event3_activity
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
20001B  8793    03.09.2019 09:45    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  9017    03.09.2019 09:49    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  5454    04.09.2019 12:42    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description

根据ID1ID2,我想基于包含各自事件和活动的列创建一系列事件日志

基本上,我的事件日志表应该如下所示:

ID          Event               Activity
6456-10001A 05.09.2019 12:32    Event1_Description
6456-10001A 09.09.2019 12:40    Event2_Description
6456-10001A 10.09.2019 12:40    Event3_Description
6456-10001A 05.09.2019 12:32    Event1_Description
6456-10001A 09.09.2019 12:40    Event2_Description
6456-10001A 10.09.2019 12:40    Event3_Description
8793-20001B 03.09.2019 09:45    Event1_Description
8793-20001B 10.09.2019 12:25    Event2_Description
8793-20001B 04.09.2019 09:45    Event3_Description
9017-20001B 03.09.2019 09:49    Event1_Description
9017-20001B 10.09.2019 12:25    Event2_Description
9017-20001B 04.09.2019 09:49    Event3_Description
5454-20001B 04.09.2019 12:42    Event1_Description
5454-20001B 10.09.2019 12:25    Event2_Description
5454-20001B 05.09.2019 12:42    Event3_Description

任何建议都将不胜感激


Tags: 数据eventid事件descriptionactivity建议表格
3条回答

使用melt。动态-更多列(>;3)仍将工作

df = pd.read_csv(io.StringIO("""ID1     ID2      Event1             Event1_activity     Event2              Event2_activity     Event3              Event3_activity
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
10001A  6456    05.09.2019 12:32    Event1_Description  09.09.2019 12:40    Event2_Description  10.09.2019 12:40    Event3_Description
20001B  8793    03.09.2019 09:45    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  9017    03.09.2019 09:49    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description
20001B  5454    04.09.2019 12:42    Event1_Description  10.09.2019 12:25    Event2_Description  11.09.2019 12:25    Event3_Description"""
                            ), sep="\s\s+", engine="python")

# pepare ID column as concatenation
df = df.assign(ID=lambda dfa: dfa["ID1"].astype(str)+"-"+dfa["ID2"].astype(str)).drop(columns=["ID1","ID2"])
# melt out both sets of columns for Event and Activity then merge
# NB reset_index() to ensure merge key works.  Plus only want ID on LHS dataframe
df2 = pd.merge(
    pd.melt(df, id_vars=["ID"], 
            value_vars=[c for c in df.columns if "Event" in c and "activity" not in c], 
            value_name="Event").drop(columns="variable").reset_index(),
    pd.melt(df, id_vars=["ID"], 
            value_vars=[c for c in df.columns if "activity" in c], 
            value_name="Activity").drop(columns=["variable","ID"]).reset_index(),

    on="index"
).drop(columns="index").sort_values(["ID","Event"])

输出

         ID            Event           Activity
10001A-6456 05.09.2019 12:32 Event1_Description
10001A-6456 05.09.2019 12:32 Event1_Description
10001A-6456 09.09.2019 12:40 Event2_Description
10001A-6456 09.09.2019 12:40 Event2_Description
10001A-6456 10.09.2019 12:40 Event3_Description
10001A-6456 10.09.2019 12:40 Event3_Description
20001B-5454 04.09.2019 12:42 Event1_Description
20001B-5454 10.09.2019 12:25 Event2_Description
20001B-5454 11.09.2019 12:25 Event3_Description
20001B-8793 03.09.2019 09:45 Event1_Description
20001B-8793 10.09.2019 12:25 Event2_Description
20001B-8793 11.09.2019 12:25 Event3_Description
20001B-9017 03.09.2019 09:49 Event1_Description
20001B-9017 10.09.2019 12:25 Event2_Description
20001B-9017 11.09.2019 12:25 Event3_Description

df enter image description here

您可以创建新的ID,然后连接数据帧子集并按ID排序

df['ID'] = df['ID2'].astype(str) + '-' + df['ID1']
n_events = 3
pd.concat([df[['ID', f'Event{i}', f'Event{i}_activity']].rename(columns={f'Event{i}': 'Event', f'Event{i}_activity': 'Activity'}) 
           for i in range(1, n_events+1)]
         ).sort_values(by='ID').reset_index(drop=True)

        ID           Event             Activity
0   5454-20001B 04.09.2019 12:42    Event1_Description
1   5454-20001B 10.09.2019 12:25    Event2_Description
2   5454-20001B 11.09.2019 12:25    Event3_Description
3   6456-10001A 05.09.2019 12:32    Event1_Description
4   6456-10001A 05.09.2019 12:32    Event1_Description
5   6456-10001A 09.09.2019 12:40    Event2_Description
6   6456-10001A 09.09.2019 12:40    Event2_Description
7   6456-10001A 10.09.2019 12:40    Event3_Description
8   6456-10001A 10.09.2019 12:40    Event3_Description
9   8793-20001B 03.09.2019 09:45    Event1_Description
10  8793-20001B 10.09.2019 12:25    Event2_Description
11  8793-20001B 11.09.2019 12:25    Event3_Description
12  9017-20001B 03.09.2019 09:49    Event1_Description
13  9017-20001B 10.09.2019 12:25    Event2_Description
14  9017-20001B 11.09.2019 12:25    Event3_Description

如果必须保留ID的原始顺序,则必须采取不同的方法

^{}与create ID列一起使用,并将Event1_activity等列名交换到activity_Event1

df['ID']  = df.pop("ID1").astype(str) + "-" + df.pop("ID2").astype(str))

df.columns = [f'{x[1]}_{x[0]}' if len(x) == 2 else f'{"".join(x)}' 
                for x in df.columns.str.split('_')]

df = (pd.wide_to_long(df.reset_index(),
                      stubnames=['Event','activity_Event'],
                      i=['index','ID'],
                      j='tmp')
        .reset_index(level=1).reset_index(drop=True))
print (df) 
            ID             Event      activity_Event
0   10001A-6456  05.09.2019 12:32  Event1_Description
1   10001A-6456  09.09.2019 12:40  Event2_Description
2   10001A-6456  10.09.2019 12:40  Event3_Description
3   10001A-6456  05.09.2019 12:32  Event1_Description
4   10001A-6456  09.09.2019 12:40  Event2_Description
5   10001A-6456  10.09.2019 12:40  Event3_Description
6   20001B-8793  03.09.2019 09:45  Event1_Description
7   20001B-8793  10.09.2019 12:25  Event2_Description
8   20001B-8793  11.09.2019 12:25  Event3_Description
9   20001B-9017  03.09.2019 09:49  Event1_Description
10  20001B-9017  10.09.2019 12:25  Event2_Description
11  20001B-9017  11.09.2019 12:25  Event3_Description
12  20001B-5454  04.09.2019 12:42  Event1_Description
13  20001B-5454  10.09.2019 12:25  Event2_Description
14  20001B-5454  11.09.2019 12:25  Event3_Description

相关问题 更多 >