在dataframe中创建新列作为分组依据

2024-09-25 08:36:50 发布

您现在位置:Python中文网/ 问答频道 /正文

大家好,我需要你们的帮助来获得熊猫数据帧中的预期输出/结果。我有一个包含如下所示数据的文件:

Time/Location    Value
Location1   
Today             3
Next day          0
Weekend          -6
Next week         1
Location2   
Today             2
Next day         -1
Weekend           3
Next week         2
Location3   
Today             1
Next day          3
Weekend 1
Next week        -1
Location4   
Today             3
Next day          2
Weekend           5
Next week         4
Location5   
Today             4
Next day          2
Weekend           3
Next week         1
Location6   
Today            -1
Next day          3
Weekend           3
Next week         2

并期望输出如下,这将为“位置”创建新列

Location    Time       Value
Location1   Today       3
Location1   Next day    0
Location1   Weekend    -6
Location1   Next week   1
Location2   Today       2
Location2   Next day   -1
Location2   Weekend     3
Location2   Next week   2
Location3   Today       1
Location3   Next day    3
Location3   Weekend     1
Location3   Next week  -1
Location4   Today       3
Location4   Next day    2
Location4   Weekend     5
Location4   Next week   4
Location5   Today       4
Location5   Next day    2
Location5   Weekend     3
Location5   Next week   1
Location6   Today      -1
Location6   Next day    3
Location6   Weekend     3
Location6   Next week   2

谢谢你的帮助/建议……请

谢谢大家!


Tags: 数据todaytimevaluelocationnextweekday
1条回答
网友
1楼 · 发布于 2024-09-25 08:36:50

如果Value中不存在缺少的值,请使用^{}替换第一列中缺少的值,并通过ffill向前填充,最后通过^{}rename列删除行:

df.insert(0, 'Location', df['Time/Location'].mask(df['Value'].notna()).ffill())
df = df.dropna(subset=['Value']).rename(columns={'Time/Location':'Time'})
print (df)
     Location       Time  Value
1   Location1      Today    3.0
2   Location1   Next day    0.0
3   Location1    Weekend   -6.0
4   Location1  Next week    1.0
6   Location2      Today    2.0
7   Location2   Next day   -1.0
8   Location2    Weekend    3.0
9   Location2  Next week    2.0
11  Location3      Today    1.0
12  Location3   Next day    3.0
13  Location3    Weekend    1.0
14  Location3  Next week   -1.0
16  Location4      Today    3.0
17  Location4   Next day    2.0
18  Location4    Weekend    5.0
19  Location4  Next week    4.0
21  Location5      Today    4.0
22  Location5   Next day    2.0
23  Location5    Weekend    3.0
24  Location5  Next week    1.0
26  Location6      Today   -1.0
27  Location6   Next day    3.0
28  Location6    Weekend    3.0
29  Location6  Next week    2.0

另一个想法是通过^{}测试第一列中的值,并通过^{}过滤:

L = ['Today','Next day','Weekend','Next week']
m = df['Time/Location'].isin(L)
df.insert(0, 'Location', df['Time/Location'].mask(m).ffill())
df = df[m].rename(columns={'Time/Location':'Time'})
print (df)
     Location       Time  Value
1   Location1      Today    3.0
2   Location1   Next day    0.0
3   Location1    Weekend   -6.0
4   Location1  Next week    1.0
6   Location2      Today    2.0
7   Location2   Next day   -1.0
8   Location2    Weekend    3.0
9   Location2  Next week    2.0
11  Location3      Today    1.0
12  Location3   Next day    3.0
13  Location3    Weekend    1.0
14  Location3  Next week   -1.0
16  Location4      Today    3.0
17  Location4   Next day    2.0
18  Location4    Weekend    5.0
19  Location4  Next week    4.0
21  Location5      Today    4.0
22  Location5   Next day    2.0
23  Location5    Weekend    3.0
24  Location5  Next week    1.0
26  Location6      Today   -1.0
27  Location6   Next day    3.0
28  Location6    Weekend    3.0
29  Location6  Next week    2.0

相关问题 更多 >