用另一个数据帧值中的值填充数据帧行中的特定列

2024-10-03 11:13:25 发布

您现在位置:Python中文网/ 问答频道 /正文

我试图用两个不同数据集中的正确值来填充主数据集中缺少的和不正确的值

我创建了完整数据集的微型版本,如下所示(请注意,真实数据集有几千行长):

import pandas as pd

data = {'From':['GA0251','GA5201','GA5551','GA510A','GA5171','GA5151'],
        'To':['GA0201_T','GA5151_T','GA5151_R','GA5151_V','GA5151_P','GA5171_B'],
        'From_Latitude':[55.86630869,0,55.85508787,55.85594626,55.85692217,55.85669934],
        'From_Longitude':[-4.27138731,0,-4.24126866,-4.24446585,-4.24516129,-4.24358251,],
        'To_Latitude':[55.86614756,0,55.85522197,55.85593762,55.85693878,0],
        'To_Longitude':[-4.271040979,0,-4.241466534,-4.244607602,-4.244905037,0]}
 
dataset_to_correct = pd.DataFrame(data)

但是,From lat/long和To lat/long中的某些值不正确。对于From和To,我有两个类似于下面的表,我想将其替换到表中,以代替该行的两个值

根据lat/long校正的表:

data = {'Site':['GA5151_T','GA5171_B'],
        'Correct_Latitude':[55.85952791,55.87044558],
        'Correct_Longitude':[55.85661767,-4.24358251,]}
        
correct_to_coords = pd.DataFrame(data)

我希望将此表与From列匹配,然后用正确的值替换From_纬度和From_经度

校正为lat/long的表:

data = {'Site':['GA5201','GA0251'],
        'Correct_Latitude':[55.857577,55.86616756],
        'Correct_Longitude':[-4.242770,-4.272140979]}

correct_from_coords = pd.DataFrame(data)

我希望将此表与to列匹配,然后用正确的值替换to_纬度和to_经度

是否有办法将每个表中的站点与相应的“发件人”或“收件人”列相匹配,然后仅替换相应列中的值

我尝试过使用这个答案(Elegant way to replace values in pandas.DataFrame from another DataFrame)中的代码,但它似乎对数据库没有影响

(correct_to_coords.set_index('Site').rename(columns = {'Correct_Latitude':'To_Latitude'})                        .combine_first(dataset_to_correct.set_index('To')))

Tags: to数据fromdataframedatalongpdlat
3条回答
merge = dataset_to_correct.merge(correct_to_coords, left_on='To', right_on='Site', how='left')

merge.loc[(merge.To == merge.Site), 'To_Latitude'] = merge.Correct_Latitude
merge.loc[(merge.To == merge.Site), 'To_Longitude'] = merge.Correct_Longitude

# del merge['Site']
# del merge['Correct_Latitude']
# del merge['Correct_Longitude']
merge = merge.drop(columns = ['Site','Correct_Latitude','Correct_Longitude'])

merge = merge.merge(correct_from_coords, left_on='From', right_on='Site', how='left')

merge.loc[(merge.From == merge.Site), 'From_Latitude'] = merge.Correct_Latitude
merge.loc[(merge.From == merge.Site), 'From_Longitude'] = merge.Correct_Longitude

# del merge['Site']
# del merge['Correct_Latitude']
# del merge['Correct_Longitude']
merge = merge.drop(columns = ['Site','Correct_Latitude','Correct_Longitude'])

merge

@zswqa的答案产生了正确的结果,@Anurag Dabas的答案则没有

另一个可能的解决方案是,它比上面建议的合并方法快一点,尽管两者都是正确的

dataset_to_correct.set_index("To",inplace=True)
correct_to_coords.set_index("Site",inplace=True)
dataset_to_correct.loc[correct_to_coords.index, "To_Latitude"] = correct_to_coords["Correct_Latitude"]
dataset_to_correct.loc[correct_to_coords.index, "To_Longitude"] = correct_to_coords["Correct_Longitude"]
dataset_to_correct.reset_index(inplace=True)

dataset_to_correct.set_index("From",inplace=True)
correct_from_coords.set_index("Site",inplace=True)
dataset_to_correct.loc[correct_from_coords.index, "From_Latitude"] = correct_from_coords["Correct_Latitude"]
dataset_to_correct.loc[correct_from_coords.index, "From_Longitude"] = correct_from_coords["Correct_Longitude"]
dataset_to_correct.reset_index(inplace=True)

让我们通过merge()+pop()+fillna()+drop()尝试双合并:

dataset_to_correct=dataset_to_correct.merge(correct_to_coords,left_on='To',right_on='Site',how='left').drop('Site',1)
dataset_to_correct['From_Latitude']=dataset_to_correct.pop('Correct_Latitude').fillna(dataset_to_correct['From_Latitude'])
dataset_to_correct['From_Longitude']=dataset_to_correct.pop('Correct_Longitude').fillna(dataset_to_correct['From_Longitude'])
dataset_to_correct=dataset_to_correct.merge(correct_from_coords,left_on='From',right_on='Site',how='left').drop('Site',1)
dataset_to_correct['To_Latitude']=dataset_to_correct.pop('Correct_Latitude').fillna(dataset_to_correct['To_Latitude'])
dataset_to_correct['To_Longitude']=dataset_to_correct.pop('Correct_Longitude').fillna(dataset_to_correct['To_Longitude'])

相关问题 更多 >