使用pandas填补间隙，而不是结束处的NaN值问题的回答

使用pandas填补间隙，而不是结束处的NaN值

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有一些住房价格数据，跨度约8个月，并跟踪价格随着房子上市，直到出售。中间的数据有一些空白，我想填补，但我想让每一个NAN的末尾保持不变。在 举一个简单的例子，假设我们的房子1在“第4天”以20万英镑的价格上市，在“第9天”以19万英镑的价格售出。我们的房子2在第1-12天停留在18万英镑，在这个时间窗口内没有出售。但是，第6天和第7天出了问题，我丢失了数据： <pre><code>house1 = [NaN, NaN, NaN, 200000, 200000, NaN, NaN, 200000, 190000, NaN, NaN, NaN] house2 = [180000, 180000, 180000, 180000, 180000, NaN, NaN, 180000, 180000, 180000, 180000, 180000] </code></pre> 现在想象一下，这些不是常规数组，而是Pandas数据帧中按日期索引的列。在 问题是，我通常用来填补空白的函数是<a href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html" rel="nofollow">DataFrame.fillna()</a>，使用回填或ffill方法。如果我使用ffill，house1将返回： ^{pr2}$ 这填补了空白，但也错误地填充了销售日之后的数据。如果我使用回填，我得到这样的结果： <pre><code>house1 = [200000, 200000, 200000, 200000, 200000, 200000, 200000, 200000, 190000, NaN, NaN, NaN] </code></pre> 同样，它填补了空白，但这次它也填补了数据的前端。如果我对ffill使用'limit=2'，那么我得到的是： <pre><code>house1 = [NaN, NaN, NaN, 200000, 200000, 200000, 200000, 200000, 190000, 190000, 190000, NaN] </code></pre> 它再一次填补了空白，但随后它也开始填充超出“真实”数据结尾的数据。在 到目前为止，我的解决方案是编写以下函数：在 <pre><code>def fillGaps(houseDF): """Fills up holes in the housing data""" def fillColumns(column): filled_col = column lastValue = None # Keeps track of if we are dealing with a gap in numbers gap = False i = 0 for currentValue in filled_col: # Loops over all the nans before the numbers begin if not isANumber(currentValue) and lastValue is None: pass # Keeps track of the last number we encountered before a gap elif isANumber(currentValue) and (gap is False): lastIndex = i lastValue = currentValue # Notes when we encounter a gap in numbers elif not isANumber(currentValue): gap = True # Fills in the gap elif isANumber(currentValue): gapIndicies = range(lastIndex + 1, i) for j in gapIndicies: filled_col[j] = lastValue gap = False i += 1 return filled_col filled_df = houseDF.apply(fillColumns, axis=0) return filled_df </code></pre> 它只是跳过前面的所有nan，填充空白（由实际值之间的nan组定义），而不在末尾填充nan。在 有没有一种更干净的方法，或者一种我不知道的内置熊猫功能？在

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

使用pandas填补间隙，而不是结束处的NaN值

1 个回答

相关Python问题