根据另一个数据帧中的最近位置填充数据帧中的缺失值问题的回答

根据另一个数据帧中的最近位置填充数据帧中的缺失值

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有一个类似于以下内容的数据帧： <pre><code>import pandas as pd import numpy as np date = pd.date_range(start='2020-01-01', freq='H', periods=4) locations = ["AA3", "AB1", "AD1", "AC0"] x = [5.5, 10.2, np.nan, 2.3, 11.2, np.nan, 2.1, 4.0, 6.1, np.nan, 20.3, 11.3, 4.9, 15.2, 21.3, np.nan] df = pd.DataFrame({'x': x}) df.index = pd.MultiIndex.from_product([locations, date], names=['location', 'date']) df = df.sort_index() df </code></pre> <pre><code> x location date AA3 2020-01-01 00:00:00 5.5 2020-01-01 01:00:00 10.2 2020-01-01 02:00:00 NaN 2020-01-01 03:00:00 2.3 AB1 2020-01-01 00:00:00 11.2 2020-01-01 01:00:00 NaN 2020-01-01 02:00:00 2.1 2020-01-01 03:00:00 4.0 AC0 2020-01-01 00:00:00 4.9 2020-01-01 01:00:00 15.2 2020-01-01 02:00:00 21.3 2020-01-01 03:00:00 NaN AD1 2020-01-01 00:00:00 6.1 2020-01-01 01:00:00 NaN 2020-01-01 02:00:00 20.3 2020-01-01 03:00:00 11.3 </code></pre> 索引值是位置代码和一天中的小时数。我想用同一天和同一小时内最近位置的同一列的有效值来填充<code>x</code>列缺少的值，其中每个位置到其他位置的距离定义为 <pre><code>nearest = pd.DataFrame({"AA3": ["AA3", "AB1", "AD1", "AC0"], "AB1": ["AB1", "AA3", "AC0", "AD1"], "AD1": ["AD1", "AC0", "AB1", "AA3"], "AC0": ["AC0", "AD1", "AA3", "AB1"]}) nearest </code></pre> <pre><code> AA3 AB1 AD1 AC0 0 AA3 AB1 AD1 AC0 1 AB1 AA3 AC0 AD1 2 AD1 AC0 AB1 AA3 3 AC0 AD1 AA1 AB1 </code></pre> 在此数据集中，列名是位置代码，每列下的行值按其与名称为列名的位置的接近程度指示其他位置 如果最近的位置在同一天和同一小时也缺少值，那么我将取第二个最近的位置在同一天和同一小时的值。如果第二个最近的位置丢失，则第三个最近的位置在同一天和同一小时，依此类推 期望输出： <pre><code> x location date AA3 2020-01-01 00:00:00 5.5 2020-01-01 01:00:00 10.2 2020-01-01 02:00:00 2.1 2020-01-01 03:00:00 2.3 AB1 2020-01-01 00:00:00 11.2 2020-01-01 01:00:00 10.2 2020-01-01 02:00:00 2.1 2020-01-01 03:00:00 4.0 AC0 2020-01-01 00:00:00 4.9 2020-01-01 01:00:00 15.2 2020-01-01 02:00:00 21.3 2020-01-01 03:00:00 11.3 AD1 2020-01-01 00:00:00 6.1 2020-01-01 01:00:00 15.2 2020-01-01 02:00:00 20.3 2020-01-01 03:00:00 11.3 </code></pre> 以下基于<a href="https://stackoverflow.com/users/5972189/kiona1018">@kiona1018</a>的建议按预期工作，但速度较慢 <pre><code>def fillna_by_nearest(x: pd.Series, nn_data: pd.DataFrame): out = x.copy() for index, value in x.iteritems(): if np.isnan(value) and (index[0] in nn_data.columns): location, date = index for near_location in nn_data[location]: if ((near_location, date) in x.index) and pd.notna(x.loc[near_location, date]): out.loc[index] = x.loc[near_location, date] break return out fillna_by_nearest(df['x'], nearest) </code></pre>

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

根据另一个数据帧中的最近位置填充数据帧中的缺失值

1 个回答

相关Python问题