Pandas:通过将行移到th来转换上三角数据帧

2024-09-26 17:52:12 发布

您现在位置:Python中文网/ 问答频道 /正文

我有一个看起来像“上三角”的数据帧:

            31-May-11  30-Jun-11  31-Jul-11  31-Aug-11  30-Sep-11  31-Oct-11
OpenDate
2011-05-31  68.432797  81.696071  75.083249  66.659008  68.898034  72.622304
2011-06-30        NaN   1.711097   1.501082   1.625213   1.774645   1.661183
2011-07-31        NaN        NaN   0.422364   0.263561   0.203572   0.234376
2011-08-31        NaN        NaN        NaN   1.077009   1.226946   1.520701
2011-09-30        NaN        NaN        NaN        NaN   0.667091   0.495993

我想通过将第i行向左移动i-1来转换它:

^{pr2}$

编辑:

我不能排除在矩阵的上部可能有nan,所以我们可能会看到这样的情况:

            31-May-11  30-Jun-11  31-Jul-11  31-Aug-11  30-Sep-11  31-Oct-11
OpenDate
2011-05-31  68.432797  81.696071  75.083249  66.659008  68.898034  72.622304
2011-06-30        NaN        NaN   1.501082   1.625213   1.774645   1.661183
2011-07-31        NaN        NaN   0.422364   0.263561   0.203572   0.234376
2011-08-31        NaN        NaN        NaN   1.077009   1.226946   1.520701
2011-09-30        NaN        NaN        NaN        NaN   0.667091   0.495993

应该变成

            31-May-11  30-Jun-11  31-Jul-11  31-Aug-11  30-Sep-11  31-Oct-11
OpenDate
2011-05-31  68.432797  81.696071  75.083249  66.659008  68.898034  72.622304
2011-06-30  NaN        1.501082   1.625213   1.774645   1.661183   NaN
2011-07-31  0.422364   0.263561   0.203572   0.234376   NaN        NaN
2011-08-31  1.077009   1.226946   1.520701   NaN        NaN        NaN
2011-09-30  0.667091   0.495993   NaN        NaN        NaN        NaN   

有什么办法吗?在

谢谢, 安妮


Tags: 数据编辑情况矩阵nanoctjunsep
3条回答

您可以对NaN值进行计数,删除它们,然后在末尾再次追加相同的数量。比如说:

def shift_df(row):

    n = len(row)

    new_row = row.dropna().tolist()
    new_row += ([np.nan]*(n-len(new_row)))

    return pd.Series(new_row, index=row.index)

df.apply(shift_df, axis=1)

其中df是您的数据帧。只有在“正常”数据之间没有NaN值时,这才有效。在

这里有一种方法可以使用numpy

输入:

In [96]: df
Out[96]:
                 1       2       3       4       5       6
0
2011-05-31  68.433  81.696  75.083  66.659  68.898  72.622
2011-06-30     NaN   1.711   1.501   1.625   1.775   1.661
2011-07-31     NaN     NaN   0.422   0.264   0.204   0.234
2011-08-31     NaN     NaN     NaN   1.077   1.227   1.521
2011-09-30     NaN     NaN     NaN     NaN   0.667   0.496

代码

^{pr2}$

输出:

                 1       2       3       4       5       6
0
2011-05-31  68.433  81.696  75.083  66.659  68.898  72.622
2011-06-30   1.711   1.501   1.625   1.775   1.661     NaN
2011-07-31   0.422   0.264   0.204   0.234     NaN     NaN
2011-08-31   1.077   1.227   1.521     NaN     NaN     NaN
2011-09-30   0.667   0.496     NaN     NaN     NaN     NaN

让我们timeit

In [95]: %%timeit
   ....: roller = lambda (i, x): np.roll(x, -i)
   ....: row_terator = enumerate(df.values)
   ....: rolled = map(roller, row_terator)
   ....: result = DataFrame(np.vstack(rolled), index=df.index, columns=df.columns)
   ....:
10000 loops, best of 3: 101 us per loop

注意,np.roll是这里最重要的东西。它需要一个数组、一个整数位数的移位和一个axis参数,这样你就可以沿着它的任何一个轴移动ndarray。在

df.apply(lambda x: x.shift(-x.notnull().argmax()), 1)

lambda函数查找第一个非空值的位置,并相应地移动行。这有两个问题:它没有利用已知的结构(上三角),因此可能会牺牲一些速度,而且,它可能会被数据中额外的nan欺骗。在

更新

一个更健壮的解决方案,使用itertools的计数器。在

^{pr2}$

正如预期的那样,这会快一点。在

In [47]: %timeit df.apply(lambda x: x.shift(-c.next() + 1), 1)
1000 loops, best of 3: 766 us per loop

In [49]: %timeit df.apply(lambda x: x.shift(-x.notnull().argmax()), 1)
1000 loops, best of 3: 1.08 ms per loop

相关问题 更多 >

    热门问题