将单元格从2列拆分为行

2024-10-01 17:34:14 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个如下的数据集

Cus_ID    Event                              Day
  1       Event1~Event2~Event3~Event4        1~1~1~1
  2       Event3~Event4~Event5~Event6        1~2~3~4

the output i'm trying to get would be: 

 Cus_ID |             Event          |  Day    |  EventSplit|Day split
----------------------------------------------------------------------------
1       | Event1~Event2~Event3~Event4| 1~1~1~1 | Event1 |1
1       | Event1~Event2~Event3~Event4| 1~1~1~1 | Event2 |1
1       | Event1~Event2~Event3~Event4| 1~1~1~1 | Event3 |1
1       | Event1~Event2~Event3~Event4| 1~1~1~1 | Event4 |1
2       | Event3~Event4~Event5~Event6| 1~2~3~4 | Event3 |1
2       | Event3~Event4~Event5~Event6| 1~2~3~4 | Event4 |2
2       | Event3~Event4~Event5~Event6| 1~2~3~4 | Event5 |3
2       | Event3~Event4~Event5~Event6| 1~2~3~4 | Event6 |4

如果出现特定的分隔符,我将尝试解析此数据并将其拆分为新行。这些分隔符是“~”。也就是将Event和Day列拆分为新列名Eventsplit和Day split

我已经做了以下代码,但我不知道我该如何做这两个栏目在一次可以任何机构的帮助。你知道吗

这是我尝试过的第一种方法

import pandas as pd
import numpy as np
data =pd.read_csv("SeqData.csv")
def pre(data, c):
    event_col = data[c].str.split('~')
    clst = event.values.tolist()
    lens = [len(l) for l in clst]

    EventSplit = pd.DataFrame({c: np.concatenate(clst)}, data.index.repeat(lens))
    return data.drop(c, 1).join(EventSplit ).reset_index(drop=True)

Data_df = pre(data, 'Event')

2号进近

EventSplit= data.Event.str.split('~',expand=True).stack()
Day_split = data.Day.str.split('~',expand=True).stack()
new_data = data.join(pd.Series(index=Day_split .index.droplevel(1), data=Day_split .values, name='Day_split '))
new_data = data.join(pd.Series(index=EventSplit.index.droplevel(1), data=EventSplit.values, name='EventSplit'))

Tags: eventdataindexpdsplitdaystrevent1
1条回答
网友
1楼 · 发布于 2024-10-01 17:34:14

这是不必要的,但有点不同,因为您需要同时使用不必要的两列

df1=df.copy()
df.Event=df.Event.str.split('~')
df.Day=df.Day.str.split('~')
Tdf=pd.DataFrame({'Cus_ID':df['Cus_ID'].repeat(df.Event.str.len()),'EventSplit':np.concatenate(df.Event.tolist()),'DaySplit':np.concatenate(df.Day.tolist()),}).merge(df1,on='Cus_ID')

Tdf
Out[661]: 
   Cus_ID DaySplit EventSplit                        Event      Day
0       1        1     Event1  Event1~Event2~Event3~Event4  1~1~1~1
1       1        1     Event2  Event1~Event2~Event3~Event4  1~1~1~1
2       1        1     Event3  Event1~Event2~Event3~Event4  1~1~1~1
3       1        1     Event4  Event1~Event2~Event3~Event4  1~1~1~1
4       2        1     Event3  Event3~Event4~Event5~Event6  1~2~3~4
5       2        2     Event4  Event3~Event4~Event5~Event6  1~2~3~4
6       2        3     Event5  Event3~Event4~Event5~Event6  1~2~3~4
7       2        4     Event6  Event3~Event4~Event5~Event6  1~2~3~4

相关问题 更多 >

    热门问题