从列车行程段的数据帧开始:
df=pd.DataFrame({ 'Name': ['Susie', 'Susie', 'Frank', 'Tony', 'Tony'],
'Trip Id': [1, 1, 2, 3, 3], 'From': ['London', 'Paris', 'Lyon', 'Munich', 'Prague'],
'To': ['Paris', 'Berlin', 'Milan', 'Prague', 'Vienna'],
'Passenger Count': [1, 1, 2, 4, 4]})
Name Trip Id From To Passenger Count
Susie 1 London Paris 1
Susie 1 Paris Berlin 1
Frank 2 Lyon Milan 2
Tony 3 Munich Prague 4
Tony 3 Prague Vienna 4
(注:一次旅行是一系列相关的路段,形成一项旅行活动,比如换乘火车。)
我需要扩展并删除乘客计数,以实现每个人段一行数据帧。
每个匿名段应列出参考乘客。每个旅行者都需要自己的Trip Id
。
结果应该如下所示:
Name Trip Id From To Named Passenger
Susie 1 London Paris NaN
Susie 1 Paris Berlin NaN
Frank 2 Lyon Milan NaN
NaN 4 Lyon Milan Frank
Tony 3 Munich Prague NaN
Tony 3 Prague Vienna NaN
NaN 5 Munich Prague Tony
NaN 5 Prague Vienna Tony
NaN 6 Munich Prague Tony
NaN 6 Prague Vienna Tony
NaN 7 Munich Prague Tony
NaN 7 Prague Vienna Tony
我几乎做到了这一点,但我正在努力让每个人都有自己的trip id
我首先设法像这样扩展乘客:
# First, setting the reference name for all records
df['Named Passenger'] = df.apply(lambda r: r['Name'], axis=1)
# Creating an expansion index.
new_index = df.index.repeat(df['Passenger Count'])
# Expanding the df
expanded = df.loc[new_index]
# Removing again the reference name for the original rows
expanded.loc[~new_index.duplicated(), 'Named Passenger'] = np.nan
# And removing the Name on duplicated rows (>1 personal info columns in reality)
expanded.loc[new_index.duplicated(), 'Name'] = np.nan
expanded = expanded.reset_index(drop=True)
expanded.drop(columns=['Passenger Count'], inplace=True)
expanded
现在看起来是这样的:
Name Trip Id From To Named Passenger
0 Susie 1 London Paris NaN
1 Susie 1 Paris Berlin NaN
2 Frank 2 Lyon Milan NaN
3 NaN 2 Lyon Milan Frank
4 Tony 3 Munich Prague NaN
5 NaN 3 Munich Prague Tony
6 NaN 3 Munich Prague Tony
7 NaN 3 Munich Prague Tony
8 Tony 3 Prague Vienna NaN
9 NaN 3 Prague Vienna Tony
10 NaN 3 Prague Vienna Tony
11 NaN 3 Prague Vienna Tony
…但我现在不知道如何正确更新旅行Id?(不管是什么,只要每个乘客都是独一无二的。)
你可以把射程和爆炸结合起来。这对你有用吗
相关问题 更多 >
编程相关推荐