在dataframe中创建新列作为分组依据

Time/Location Value Location1 Today 3 Next day 0 Weekend -6 Next week 1 Location2 Today 2 Next day -1 Weekend 3 Next week 2 Location3 Today 1 Next day 3 Weekend 1 Next week -1 Location4 Today 3 Next day 2 Weekend 5 Next week 4 Location5 Today 4 Next day 2 Weekend 3 Next week 1 Location6 Today -1 Next day 3 Weekend 3 Next week 2

Location Time Value Location1 Today 3 Location1 Next day 0 Location1 Weekend -6 Location1 Next week 1 Location2 Today 2 Location2 Next day -1 Location2 Weekend 3 Location2 Next week 2 Location3 Today 1 Location3 Next day 3 Location3 Weekend 1 Location3 Next week -1 Location4 Today 3 Location4 Next day 2 Location4 Weekend 5 Location4 Next week 4 Location5 Today 4 Location5 Next day 2 Location5 Weekend 3 Location5 Next week 1 Location6 Today -1 Location6 Next day 3 Location6 Weekend 3 Location6 Next week 2

1条回答

网友

1楼 · 发布于 2024-09-25 08:36:50

如果Value中不存在缺少的值，请使用^{}替换第一列中缺少的值，并通过ffill向前填充，最后通过^{}和rename列删除行：

df.insert(0, 'Location', df['Time/Location'].mask(df['Value'].notna()).ffill())
df = df.dropna(subset=['Value']).rename(columns={'Time/Location':'Time'})
print (df)
     Location       Time  Value
1   Location1      Today    3.0
2   Location1   Next day    0.0
3   Location1    Weekend   -6.0
4   Location1  Next week    1.0
6   Location2      Today    2.0
7   Location2   Next day   -1.0
8   Location2    Weekend    3.0
9   Location2  Next week    2.0
11  Location3      Today    1.0
12  Location3   Next day    3.0
13  Location3    Weekend    1.0
14  Location3  Next week   -1.0
16  Location4      Today    3.0
17  Location4   Next day    2.0
18  Location4    Weekend    5.0
19  Location4  Next week    4.0
21  Location5      Today    4.0
22  Location5   Next day    2.0
23  Location5    Weekend    3.0
24  Location5  Next week    1.0
26  Location6      Today   -1.0
27  Location6   Next day    3.0
28  Location6    Weekend    3.0
29  Location6  Next week    2.0

另一个想法是通过^{}测试第一列中的值，并通过^{}过滤：

L = ['Today','Next day','Weekend','Next week']
m = df['Time/Location'].isin(L)
df.insert(0, 'Location', df['Time/Location'].mask(m).ffill())
df = df[m].rename(columns={'Time/Location':'Time'})
print (df)
     Location       Time  Value
1   Location1      Today    3.0
2   Location1   Next day    0.0
3   Location1    Weekend   -6.0
4   Location1  Next week    1.0
6   Location2      Today    2.0
7   Location2   Next day   -1.0
8   Location2    Weekend    3.0
9   Location2  Next week    2.0
11  Location3      Today    1.0
12  Location3   Next day    3.0
13  Location3    Weekend    1.0
14  Location3  Next week   -1.0
16  Location4      Today    3.0
17  Location4   Next day    2.0
18  Location4    Weekend    5.0
19  Location4  Next week    4.0
21  Location5      Today    4.0
22  Location5   Next day    2.0
23  Location5    Weekend    3.0
24  Location5  Next week    1.0
26  Location6      Today   -1.0
27  Location6   Next day    3.0
28  Location6    Weekend    3.0
29  Location6  Next week    2.0

相关问题更多 >

编程相关推荐

热门问题

热门文章