使用外键将1个数据帧转换为2个数据帧

df = pd.DataFrame( [['tom', 28, "tesla", "model s" ], ['nick', 30, "ford", "mustang"], ['juli', 56, "tesla", "model s"]] , columns = ['PersonName', 'PersonAge', "Car", "CarModel"] ) df.head() PersonName PersonAge Car CarModel 0 tom 28 tesla model s 1 nick 30 ford mustang 2 juli 56 tesla model s

3条回答

网友

1楼 · 编辑于 2024-09-27 07:31:55

您可以将列分配给新的数据帧以创建Person和Cars

>>> df = pd.DataFrame(
...     [['tom', 28, "tesla", "model s" ], 
...     ['nick', 30, "ford", "mustang"], 
...     ['juli', 56, "tesla", "model s"]] , 
...     columns = ['PersonName', 'PersonAge', "Car", "CarModel"]
... )
>>> df
  PersonName  PersonAge    Car CarModel
0        tom         28  tesla  model s
1       nick         30   ford  mustang
2       juli         56  tesla  model s
>>> 
>>> Person = df[["PersonName", "PersonAge"]]
>>> Cars = df[["Car", "CarModel"]]
>>> Person
  PersonName  PersonAge
0        tom         28
1       nick         30
2       juli         56
>>> Cars
     Car CarModel
0  tesla  model s
1   ford  mustang
2  tesla  model s

请注意，原始数据帧有一个索引，在创建新的数据帧时，该索引将被保留。该索引可以用作外键。事实上，索引部分在熊猫中通常是透明的。要从人名中选择汽车，您可以

>>> Cars[Person["PersonName"] == "nick"]
    Car CarModel
1  ford  mustang

如果要将索引转换为自己的列，可以这样做

>>> Cars["id"] = Cars.index
>>> Person["CarId"] = Person.index
>>> Person
  PersonName  PersonAge  CarId
0        tom         28      0
1       nick         30      1
2       juli         56      2
>>> Cars
     Car CarModel  id
0  tesla  model s   0
1   ford  mustang   1
2  tesla  model s   2

但坚持指数操作通常更方便。。。。除非重置索引，否则必须使用不同的列

网友

2楼 · 编辑于 2024-09-27 07:31:55

这应该起作用：

import pandas as pd

df = pd.DataFrame(
    [['tom', 28, "tesla", "model s" ],
    ['nick', 30, "ford", "mustang"],
    ['juli', 56, "tesla", "model s"]] ,
    columns = ['PersonName', 'PersonAge', "Car", "CarModel"]
)
df['Comb'] = df['Car'] + "," + df['CarModel']
car_df = pd.DataFrame({'CarId': range(1, len(df['Comb'].unique())+1), 'Comb': df['Comb'].unique()})
car_df = car_df.merge(df[['Comb', 'Car', 'CarModel']], on=['Comb']).drop_duplicates(subset=['Comb']).drop(columns=['Comb'])

df = df.merge(car_df, on=['Car', 'CarModel']).drop(columns=['Comb', 'Car', 'CarModel'])
car_df = car_df.set_index('CarId')
print(df)
print()
print(car_df)

输出：

  PersonName  PersonAge  CarId
0        tom         28      1
1       juli         56      1
2       nick         30      2

         Car CarModel
CarId                
1      tesla  model s
2       ford  mustang

网友

3楼 · 编辑于 2024-09-27 07:31:55

下面是你想要的

使用groupby创建Cars数据帧

CarsDF = pd.DataFrame(df.groupby(['Car','CarModel']).Car.count())
CarsDF.columns=['v']
CarsDF = CarsDF.reset_index().drop('v', axis=1)
CarsDF.index.name='CarId'
CarsDF

然后使用merge添加CarId

PersonDF = df.merge(CarsDF.reset_index())[['PersonName','PersonAge','CarId']]

相关问题更多 >

编程相关推荐

热门问题

热门文章