如何根据列值和条件在数据框中插入行？

import pandas as pd import numpy as np df = pd.DataFrame({'Car_ID': ['B332', 'B332', 'B332', 'C315', 'C315', 'C315', 'C315', 'C315', 'F310', 'F310'], \ 'Date': ['2018-03-12', '2018-03-14', '2018-03-15', '2018-03-17', '2018-03-13', '2018-03-15', \ '2018-03-18', '2018-03-21', '2018-03-10', '2018-03-13'], \ 'Driver': ['Alex', 'Alex', 'Mick', 'Sara', 'Sara', 'Jean', 'Sara', 'Sara', 'Franck','Michel']}) df Out: Car_ID Date Driver 0 B332 2018-03-12 Alex 1 B332 2018-03-14 Alex 2 B332 2018-03-15 Mick 3 C315 2018-03-17 Sara 4 C315 2018-03-13 Sara 5 C315 2018-03-15 Jean 6 C315 2018-03-18 Sara 7 C315 2018-03-21 Sara 8 F310 2018-03-10 Franck 9 F310 2018-03-13 Michel

df["Event"] = np.where(df.Car_ID.str.contains('B', case=True, na=False), 'Rent_Car_B', \ np.where(df.Car_ID.str.contains('C', case=True, na=False), 'Rent_Car_C', \ np.where(df.Car_ID.str.contains('F', case=True, na=False), 'Rent_Car_F', df.Car_ID))) df Out: Car_ID Date Driver Event 0 B332 2018-03-12 Alex Rent_Car_B 1 B332 2018-03-14 Alex Rent_Car_B 2 B332 2018-03-15 Mick Rent_Car_B 3 C315 2018-03-17 Sara Rent_Car_C 4 C315 2018-03-13 Sara Rent_Car_C 5 C315 2018-03-15 Jean Rent_Car_C 6 C315 2018-03-18 Sara Rent_Car_C 7 C315 2018-03-21 Sara Rent_Car_C 8 F310 2018-03-10 Franck Rent_Car_F 9 F310 2018-03-13 Michel Rent_Car_F

Out: Car_ID Date Driver Event 0 B332 2018-03-12 Alex Rent_Car_B 1 B332 2018-03-14 Alex Rent_Car_B 2 B332 2018-03-15 Mick Rent_Car_B 3 B332 2018-03-15 Alex to Mick 4 C315 2018-03-17 Sara Rent_Car_C 5 C315 2018-03-13 Sara Rent_Car_C 6 C315 2018-03-15 Jean Rent_Car_C 7 C315 2018-03-15 Sara to Jean 8 C315 2018-03-18 Sara Rent_Car_C 9 C315 2018-03-18 Jean to Sara 10 C315 2018-03-21 Sara Rent_Car_C 11 F310 2018-03-10 Franck Rent_Car_F 12 F310 2018-03-13 Michel Rent_Car_F 13 F310 2018-03-13 Franck to Mike

3条回答

网友

1楼 · 编辑于 2024-05-06 07:57:27

这是一个相当复杂的问题，我的观点是：

# Add the Driver columns by shifting grouped by the Event
df['new'] =  df.groupby('Event').apply(lambda x : x['Driver'].shift(1) +'to'+ x['Driver']).values
# Split them by 'to'
df['new'] =df['new'].str.split('to').bfill()
# Check if both of them are equal
m = df['new'].str[0] != df['new'].str[1]
# Based on the condition create a new dataframe
new_df = df.loc[m].copy().iloc[:-1]
# Convert the list to the format you desired 
new_df['new'] = new_df['new'].str[0] + ' to ' + new_df['new'].str[1]
# Concat new dataframe and old dataframe 
mdf = pd.concat([df.drop('new',1) , new_df.drop(['Driver','Event'],1) \
               .rename(columns = {'new':'Event'})])


     Car_ID        Date  Driver             Event
0   B332  2018-03-12    Alex        Rent_Car_B
1   B332  2018-03-14    Alex        Rent_Car_B
2   B332  2018-03-15    Mick        Rent_Car_B
3   C315  2018-03-17    Sara        Rent_Car_C
4   C315  2018-03-13    Sara        Rent_Car_C
5   C315  2018-03-15    Jean        Rent_Car_C
6   C315  2018-03-18    Sara        Rent_Car_C
7   C315  2018-03-21    Sara        Rent_Car_C
8   F310  2018-03-10  Franck        Rent_Car_F
9   F310  2018-03-13  Michel        Rent_Car_F
2   B332  2018-03-15     NaN      Alex to Mick
5   C315  2018-03-15     NaN      Sara to Jean
6   C315  2018-03-18     NaN      Jean to Sara
8   F310  2018-03-10     NaN  Franck to Michel

如果你想要顺序，那么对索引进行排序

mdf = mdf.sort_index()

网友

2楼 · 编辑于 2024-05-06 07:57:27

你只需要几次换班就可以很好地做到这一点！您还可以使用此方法正确地获取索引，并将它们精确地添加到所需的位置。你知道吗

添加Event后，执行DataFrame：

import pandas as pd

# Modify the index so we can later append to the correct rows
df.index= df.index*2

# Determine when switches occur
mask = (df.Driver != df.Driver.shift(1)) & (df.Car_ID == df.Car_ID.shift(1))
sw_from = df[mask.shift(-1).fillna(False)].copy()
sw_to = df[mask].copy()

# Make the switching rows have the correct information
sw_to['Event'] = sw_from.Driver.values + ' to ' + sw_to.Driver.values
sw_to['Driver'] = ''

# Modify the switching indices so they get added to the proper position
sw_to.index = sw_to.index+1

# Add them to df
df = df.append(sw_to).sort_index().reset_index(drop=True)

输出：

   Car_ID        Date  Driver             Event
0    B332  2018-03-12    Alex        Rent_Car_B
1    B332  2018-03-14    Alex        Rent_Car_B
2    B332  2018-03-15    Mick        Rent_Car_B
3    B332  2018-03-15              Alex to Mick
4    C315  2018-03-17    Sara        Rent_Car_C
5    C315  2018-03-13    Sara        Rent_Car_C
6    C315  2018-03-15    Jean        Rent_Car_C
7    C315  2018-03-15              Sara to Jean
8    C315  2018-03-18    Sara        Rent_Car_C
9    C315  2018-03-18              Jean to Sara
10   C315  2018-03-21    Sara        Rent_Car_C
11   F310  2018-03-10  Franck        Rent_Car_F
12   F310  2018-03-13  Michel        Rent_Car_F
13   F310  2018-03-13          Franck to Michel

网友

3楼 · 编辑于 2024-05-06 07:57:27

使用shift方法，首先用它创建一个列，我们将在下面使用它：

df['Driver_shift'] = df['Driver'].shift()

选择您实际更换驾驶员和带面罩的相同车号的行：

mask = (df['Driver'] != df['Driver_shift'])&(df['Car_ID'] == df['Car_ID'].shift())
df_change = df[mask]

现在，通过添加0.5来更改索引，以便以后进行连接和排序，并更改两列的值：

df_change = df_change.set_index(df_change.index+0.5)
df_change.loc[:,'Event'] = df_change['Driver_shift'] + ' to ' + df_change['Driver']
df_change['Driver'] = '' # to replace the value

现在可以连接、排序、重置索引并删除：

pd.concat([df,df_change]).sort_index().reset_index(drop=True).drop('Driver_shift',1)

你会得到：

   Car_ID        Date  Driver             Event
0    B332  2018-03-12    Alex        Rent_Car_B
1    B332  2018-03-14    Alex        Rent_Car_B
2    B332  2018-03-15    Mick        Rent_Car_B
3    B332  2018-03-15              Alex to Mick
4    C315  2018-03-17    Sara        Rent_Car_C
5    C315  2018-03-13    Sara        Rent_Car_C
6    C315  2018-03-15    Jean        Rent_Car_C
7    C315  2018-03-15              Sara to Jean
8    C315  2018-03-18    Sara        Rent_Car_C
9    C315  2018-03-18              Jean to Sara
10   C315  2018-03-21    Sara        Rent_Car_C
11   F310  2018-03-10  Franck        Rent_Car_F
12   F310  2018-03-13  Michel        Rent_Car_F
13   F310  2018-03-13          Franck to Michel

编辑：在每个驱动程序和日期前添加一行

df1 = df.copy()
df1.index = df1.index +0.5
df2 = pd.concat([df.drop('Event',1),df1]).sort_index().reset_index(drop=True)
df2['Event'] = df2['Event'].fillna(df2['Driver'])

结果是df2

相关问题更多 >

编程相关推荐

热门问题

热门文章