擅长:python、mysql、java
<p>如果您不关心填写哪个值,一个简单的方法是按位置和zip对表进行排序,然后使用fillna和method='ffill'</p>
<pre><code> >>> df
zip location
0 65123.0 Houston
1 65123.0 Houston
2 NaN Houston
3 89517.0 Berkley
4 89518.0 Berkley
5 NaN Berkley
>>> df.sort_values(by=['location','zip']).fillna(method='ffill')
zip location
3 89517.0 Berkley
4 89518.0 Berkley
5 89518.0 Berkley
0 65123.0 Houston
1 65123.0 Houston
2 65123.0 Houston
</code></pre>
<p>更新:下面的解决方案也处理位置中的nan。首先使用groupby函数,然后在组内按max填充na</p>
<pre><code>>>> df
zip location
0 65123.0 Houston
1 65123.0 Houston
2 NaN Houston
3 89517.0 Berkley
4 89518.0 Berkley
5 NaN Berkley
6 NaN NaN
>>> df['zip'] = df.groupby('location')['zip'].apply(lambda x:x.fillna(x.max()))
>>> df
zip location
0 65123.0 Houston
1 65123.0 Houston
2 65123.0 Houston
3 89517.0 Berkley
4 89518.0 Berkley
5 89518.0 Berkley
6 NaN NaN
</code></pre>