回答此问题可获得 20 贡献值,回答如果被采纳可获得 50 分。
<p>我试图用两个不同数据集中的正确值来填充主数据集中缺少的和不正确的值</p>
<p>我创建了完整数据集的微型版本,如下所示(请注意,真实数据集有几千行长):</p>
<pre><code>import pandas as pd
data = {'From':['GA0251','GA5201','GA5551','GA510A','GA5171','GA5151'],
'To':['GA0201_T','GA5151_T','GA5151_R','GA5151_V','GA5151_P','GA5171_B'],
'From_Latitude':[55.86630869,0,55.85508787,55.85594626,55.85692217,55.85669934],
'From_Longitude':[-4.27138731,0,-4.24126866,-4.24446585,-4.24516129,-4.24358251,],
'To_Latitude':[55.86614756,0,55.85522197,55.85593762,55.85693878,0],
'To_Longitude':[-4.271040979,0,-4.241466534,-4.244607602,-4.244905037,0]}
dataset_to_correct = pd.DataFrame(data)
</code></pre>
<p>但是,From lat/long和To lat/long中的某些值不正确。对于From和To,我有两个类似于下面的表,我想将其替换到表中,以代替该行的两个值</p>
<p>根据lat/long校正的表:</p>
<pre><code>data = {'Site':['GA5151_T','GA5171_B'],
'Correct_Latitude':[55.85952791,55.87044558],
'Correct_Longitude':[55.85661767,-4.24358251,]}
correct_to_coords = pd.DataFrame(data)
</code></pre>
<p>我希望将此表与From列匹配,然后用正确的值替换From_纬度和From_经度</p>
<p>校正为lat/long的表:</p>
<pre><code>data = {'Site':['GA5201','GA0251'],
'Correct_Latitude':[55.857577,55.86616756],
'Correct_Longitude':[-4.242770,-4.272140979]}
correct_from_coords = pd.DataFrame(data)
</code></pre>
<p>我希望将此表与to列匹配,然后用正确的值替换to_纬度和to_经度</p>
<p>是否有办法将每个表中的站点与相应的“发件人”或“收件人”列相匹配,然后仅替换相应列中的值</p>
<p>我尝试过使用这个答案(<a href="https://stackoverflow.com/questions/35960494/elegant-way-to-replace-values-in-pandas-dataframe-from-another-dataframe">Elegant way to replace values in pandas.DataFrame from another DataFrame</a>)中的代码,但它似乎对数据库没有影响</p>
<pre><code>(correct_to_coords.set_index('Site').rename(columns = {'Correct_Latitude':'To_Latitude'}) .combine_first(dataset_to_correct.set_index('To')))
</code></pre>