如何解决以下情况?

2024-10-06 11:23:38 发布

您现在位置:Python中文网/ 问答频道 /正文

我有以下数据类型:

id      point 1    point 2      count            Time             
018     Paris      London        01      2016-05-20 10:50:00      
015     Paris      London        01      2016-05-19 11:50:00      
002     Prague     Munich        15      2016-05-18 17:55:00      
003     Frankfurt  London        01      2016-05-17 21:15:00      
015     London     Paris         08      2016-05-21 13:50:00      
003     Barcelona  Vienna        15      2016-05-19 03:20:00      
003     London     Barcelona     01      2016-05-18 06:45:00      
002     Vienna     Prague        15      2016-05-19 02:45:00      

我想先把他们按身份证和时间分组

df = df.sort_values(['id','time'])

要得到这些结果:

id      point 1    point 2      count            Time      
002     Vienna     Prague        15      2016-05-18 02:45:00 
002     Prague     Munich        15      2016-05-18 17:55:00
003     Frankfurt  London        01      2016-05-17 21:15:00 
003     London     Barcelona     01      2016-05-18 06:45:00  
003     Barcelona  Vienna        15      2016-05-19 03:20:00 
015     Paris      London        01      2016-05-19 11:50:00
015     London     Paris         08      2016-05-21 13:50:00 
018     Paris      London        01      2016-05-20 10:50:00

如果第一行的点2和第二行的点1相同,则起点是第一行的点1,终点是第二行的点2。[id 002]

但是,如果第一行的点2与第二行的点1相同,并且第一行的点1与第一行的点2相同,则起点和终点不会改变。[ID015]

结果:

id      point 1    point 2      count            Time            Start     End
002     Vienna     Prague        15      2016-05-19 02:45:00    Vienna   Munich
002     Prague     Munich        15      2016-05-18 17:55:00    Vienna   Munich
003     Frankfurt  London        01      2016-05-17 21:15:00    Frankfurt Vienna
003     London     Barcelona     01      2016-05-18 06:45:00    Frankfurt Vienna
003     Barcelona  Vienna        15      2016-05-19 03:20:00    Frankfurt Vienna
015     Paris      London        01      2016-05-19 11:50:00    Paris     London
015     London     Paris         08      2016-05-21 13:50:00    London   Paris
018     Paris      London        01      2016-05-20 10:50:00    Paris    London

我尝试了第一个条件,使用:

 df = df.assign(start = np.where(df['point2'] == df['point1'].shift(),df.shift(1).point2,df.point1))

Tags: iddftimecountpoint起点londonparis
1条回答
网友
1楼 · 发布于 2024-10-06 11:23:38

我认为在自定义函数中需要^{}

#sort values first
df = df.sort_values(['id','Time'])
#create new columns
df['Start'] = df['point 1']
df['End'] = df['point 2']

def f(x):

    #roll values of point 2 and compare with point 1 per groups
    #all function for scalar True if all values are True
    m = (np.roll(x['point 2'].values, -1) != x['point 1']).all()
    if m:
        #assign first and last values
        x['Start'] = x['point 1'].iat[0]
        x['End'] = x['point 2'].iat[-1]
    return x

#apply custom function
df = df.groupby('id').apply(f)
print (df)
    id    point 1    point 2 count                Time      Start     End
7  002     Vienna     Prague    15 2016-05-19 02:45:00     Vienna  Munich
2  002     Prague     Munich    15 2016-05-19 17:55:00     Vienna  Munich
3  003  Frankfurt     London    01 2016-05-17 21:15:00  Frankfurt  Vienna
6  003     London  Barcelona    01 2016-05-18 06:45:00  Frankfurt  Vienna
5  003  Barcelona     Vienna    15 2016-05-19 03:20:00  Frankfurt  Vienna
1  015      Paris     London    01 2016-05-19 11:50:00      Paris  London
4  015     London      Paris    08 2016-05-21 13:50:00     London   Paris
0  018      Paris     London    01 2016-05-20 10:50:00      Paris  London

相关问题 更多 >