使用外键将1个数据帧转换为2个数据帧

2024-09-27 07:31:55 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个数据帧df,如下所示

df = pd.DataFrame(
    [['tom', 28, "tesla", "model s" ], 
    ['nick', 30, "ford", "mustang"], 
    ['juli', 56, "tesla", "model s"]] , 
    columns = ['PersonName', 'PersonAge', "Car", "CarModel"]
)
df.head()

    PersonName  PersonAge   Car CarModel
0   tom     28  tesla   model s
1   nick    30  ford    mustang
2   juli    56  tesla   model s

我想将其转换为两个数据帧,如下所示

Person=
PersonName PersonAge CarId
tom        28        1 
nick       30        2
juli       56        1

Cars=
id Car    Model
1  Tesla  Model S
2  Ford   Mustang

正确的方法是什么


Tags: 数据dfmodelcarnickpdtomford
3条回答

您可以将列分配给新的数据帧以创建PersonCars

>>> df = pd.DataFrame(
...     [['tom', 28, "tesla", "model s" ], 
...     ['nick', 30, "ford", "mustang"], 
...     ['juli', 56, "tesla", "model s"]] , 
...     columns = ['PersonName', 'PersonAge', "Car", "CarModel"]
... )
>>> df
  PersonName  PersonAge    Car CarModel
0        tom         28  tesla  model s
1       nick         30   ford  mustang
2       juli         56  tesla  model s
>>> 
>>> Person = df[["PersonName", "PersonAge"]]
>>> Cars = df[["Car", "CarModel"]]
>>> Person
  PersonName  PersonAge
0        tom         28
1       nick         30
2       juli         56
>>> Cars
     Car CarModel
0  tesla  model s
1   ford  mustang
2  tesla  model s

请注意,原始数据帧有一个索引,在创建新的数据帧时,该索引将被保留。该索引可以用作外键。事实上,索引部分在熊猫中通常是透明的。要从人名中选择汽车,您可以

>>> Cars[Person["PersonName"] == "nick"]
    Car CarModel
1  ford  mustang

如果要将索引转换为自己的列,可以这样做

>>> Cars["id"] = Cars.index
>>> Person["CarId"] = Person.index
>>> Person
  PersonName  PersonAge  CarId
0        tom         28      0
1       nick         30      1
2       juli         56      2
>>> Cars
     Car CarModel  id
0  tesla  model s   0
1   ford  mustang   1
2  tesla  model s   2

但坚持指数操作通常更方便。。。。除非重置索引,否则必须使用不同的列

这应该起作用:

import pandas as pd

df = pd.DataFrame(
    [['tom', 28, "tesla", "model s" ],
    ['nick', 30, "ford", "mustang"],
    ['juli', 56, "tesla", "model s"]] ,
    columns = ['PersonName', 'PersonAge', "Car", "CarModel"]
)
df['Comb'] = df['Car'] + "," + df['CarModel']
car_df = pd.DataFrame({'CarId': range(1, len(df['Comb'].unique())+1), 'Comb': df['Comb'].unique()})
car_df = car_df.merge(df[['Comb', 'Car', 'CarModel']], on=['Comb']).drop_duplicates(subset=['Comb']).drop(columns=['Comb'])

df = df.merge(car_df, on=['Car', 'CarModel']).drop(columns=['Comb', 'Car', 'CarModel'])
car_df = car_df.set_index('CarId')
print(df)
print()
print(car_df)

输出:

  PersonName  PersonAge  CarId
0        tom         28      1
1       juli         56      1
2       nick         30      2

         Car CarModel
CarId                
1      tesla  model s
2       ford  mustang

下面是你想要的

使用groupby创建Cars数据帧

CarsDF = pd.DataFrame(df.groupby(['Car','CarModel']).Car.count())
CarsDF.columns=['v']
CarsDF = CarsDF.reset_index().drop('v', axis=1)
CarsDF.index.name='CarId'
CarsDF

然后使用merge添加CarId

PersonDF = df.merge(CarsDF.reset_index())[['PersonName','PersonAge','CarId']]

相关问题 更多 >

    热门问题