根据重复行列表中的索引增加重复行的列值

2024-09-29 01:37:53 发布

您现在位置:Python中文网/ 问答频道 /正文

从列车行程段的数据帧开始:

df=pd.DataFrame({ 'Name': ['Susie', 'Susie', 'Frank', 'Tony', 'Tony'],
  'Trip Id': [1, 1, 2, 3, 3], 'From': ['London', 'Paris', 'Lyon', 'Munich', 'Prague'],
  'To': ['Paris', 'Berlin', 'Milan', 'Prague', 'Vienna'],
  'Passenger Count': [1, 1, 2, 4, 4]})
Name     Trip Id    From      To        Passenger Count
Susie    1          London    Paris     1
Susie    1          Paris     Berlin    1
Frank    2          Lyon      Milan     2
Tony     3          Munich    Prague    4
Tony     3          Prague    Vienna    4

(注:一次旅行是一系列相关的路段,形成一项旅行活动,比如换乘火车。)

我需要扩展并删除乘客计数,以实现每个人段一行数据帧。 每个匿名段应列出参考乘客。每个旅行者都需要自己的Trip Id。 结果应该如下所示:

Name     Trip Id    From      To        Named Passenger
Susie    1          London    Paris     NaN
Susie    1          Paris     Berlin    NaN
Frank    2          Lyon      Milan     NaN
NaN      4          Lyon      Milan     Frank
Tony     3          Munich    Prague    NaN
Tony     3          Prague    Vienna    NaN
NaN      5          Munich    Prague    Tony
NaN      5          Prague    Vienna    Tony
NaN      6          Munich    Prague    Tony
NaN      6          Prague    Vienna    Tony
NaN      7          Munich    Prague    Tony
NaN      7          Prague    Vienna    Tony

我几乎做到了这一点,但我正在努力让每个人都有自己的trip id

我首先设法像这样扩展乘客:

# First, setting the reference name for all records
df['Named Passenger'] = df.apply(lambda r: r['Name'], axis=1)
# Creating an expansion index.
new_index = df.index.repeat(df['Passenger Count'])
# Expanding the df
expanded = df.loc[new_index]
# Removing again the reference name for the original rows
expanded.loc[~new_index.duplicated(), 'Named Passenger'] = np.nan
# And removing the Name on duplicated rows (>1 personal info columns in reality)
expanded.loc[new_index.duplicated(), 'Name'] = np.nan
expanded = expanded.reset_index(drop=True)
expanded.drop(columns=['Passenger Count'], inplace=True)

expanded现在看起来是这样的:

     Name  Trip Id    From      To Named Passenger
0   Susie        1  London   Paris             NaN
1   Susie        1   Paris  Berlin             NaN
2   Frank        2    Lyon   Milan             NaN
3     NaN        2    Lyon   Milan           Frank
4    Tony        3  Munich  Prague             NaN
5     NaN        3  Munich  Prague            Tony
6     NaN        3  Munich  Prague            Tony
7     NaN        3  Munich  Prague            Tony
8    Tony        3  Prague  Vienna             NaN
9     NaN        3  Prague  Vienna            Tony
10    NaN        3  Prague  Vienna            Tony
11    NaN        3  Prague  Vienna            Tony

…但我现在不知道如何正确更新旅行Id?(不管是什么,只要每个乘客都是独一无二的。)


Tags: franknameiddfindexnanparistony
1条回答
网友
1楼 · 发布于 2024-09-29 01:37:53

你可以把射程和爆炸结合起来。这对你有用吗

import pandas as pd
import numpy as np
df=pd.DataFrame({ 'Name': ['Susie', 'Susie', 'Frank', 'Tony', 'Tony'],
  'Trip Id': [1, 1, 2, 3, 3], 'From': ['London', 'Paris', 'Lyon', 'Munich', 'Prague'],
  'To': ['Paris', 'Berlin', 'Milan', 'Prague', 'Vienna'],
  'Passenger Count': [1, 1, 2, 4, 4]})

# Expand rows
df['Named Passenger'] = df['Name'].copy()
df['Passenger Id'] = df['Passenger Count'].map(lambda x: list(range(1, x+1)))
df = df.explode('Passenger Id')
df['Trip Id Unique'] = df['Trip Id'].astype(str) + "_" + df['Passenger Id'].astype(str)

# Remove names
df['Name'] = np.where(~df.index.duplicated(keep='first'), df['Name'], np.nan)
df['Named Passenger'] = np.where(df['Name'].isnull(), df['Named Passenger'], np.nan)

# Encode Trip ID with unique numerical ID
df['Trip Id Unique'] = pd.Categorical(df['Trip Id Unique'])
df['Trip Id Unique'] = df['Trip Id Unique'].cat.codes +1

# Print
df[['Name', 'Trip Id Unique', 'From', 'To', 'Named Passenger']].sort_values(by=['Trip Id Unique', 'From'])

#     Name  Trip Id Unique    From      To Named Passenger
# 0  Susie               1  London   Paris             NaN
# 1  Susie               1   Paris  Berlin             NaN
# 2  Frank               2    Lyon   Milan             NaN
# 2    NaN               3    Lyon   Milan           Frank
# 3   Tony               4  Munich  Prague             NaN
# 4   Tony               4  Prague  Vienna             NaN
# 3    NaN               5  Munich  Prague            Tony
# 4    NaN               5  Prague  Vienna            Tony
# 3    NaN               6  Munich  Prague            Tony
# 4    NaN               6  Prague  Vienna            Tony
# 3    NaN               7  Munich  Prague            Tony
# 4    NaN               7  Prague  Vienna            Tony

相关问题 更多 >