Pandas：通过将行移到th来转换上三角数据帧

31-May-11 30-Jun-11 31-Jul-11 31-Aug-11 30-Sep-11 31-Oct-11 OpenDate 2011-05-31 68.432797 81.696071 75.083249 66.659008 68.898034 72.622304 2011-06-30 NaN 1.711097 1.501082 1.625213 1.774645 1.661183 2011-07-31 NaN NaN 0.422364 0.263561 0.203572 0.234376 2011-08-31 NaN NaN NaN 1.077009 1.226946 1.520701 2011-09-30 NaN NaN NaN NaN 0.667091 0.495993

31-May-11 30-Jun-11 31-Jul-11 31-Aug-11 30-Sep-11 31-Oct-11 OpenDate 2011-05-31 68.432797 81.696071 75.083249 66.659008 68.898034 72.622304 2011-06-30 NaN NaN 1.501082 1.625213 1.774645 1.661183 2011-07-31 NaN NaN 0.422364 0.263561 0.203572 0.234376 2011-08-31 NaN NaN NaN 1.077009 1.226946 1.520701 2011-09-30 NaN NaN NaN NaN 0.667091 0.495993

31-May-11 30-Jun-11 31-Jul-11 31-Aug-11 30-Sep-11 31-Oct-11 OpenDate 2011-05-31 68.432797 81.696071 75.083249 66.659008 68.898034 72.622304 2011-06-30 NaN 1.501082 1.625213 1.774645 1.661183 NaN 2011-07-31 0.422364 0.263561 0.203572 0.234376 NaN NaN 2011-08-31 1.077009 1.226946 1.520701 NaN NaN NaN 2011-09-30 0.667091 0.495993 NaN NaN NaN NaN

3条回答

网友

1楼 · 编辑于 2024-09-26 17:52:12

您可以对NaN值进行计数，删除它们，然后在末尾再次追加相同的数量。比如说：

def shift_df(row):

    n = len(row)

    new_row = row.dropna().tolist()
    new_row += ([np.nan]*(n-len(new_row)))

    return pd.Series(new_row, index=row.index)

df.apply(shift_df, axis=1)

其中df是您的数据帧。只有在“正常”数据之间没有NaN值时，这才有效。在

网友

2楼 · 编辑于 2024-09-26 17:52:12

这里有一种方法可以使用numpy

输入：

In [96]: df
Out[96]:
                 1       2       3       4       5       6
0
2011-05-31  68.433  81.696  75.083  66.659  68.898  72.622
2011-06-30     NaN   1.711   1.501   1.625   1.775   1.661
2011-07-31     NaN     NaN   0.422   0.264   0.204   0.234
2011-08-31     NaN     NaN     NaN   1.077   1.227   1.521
2011-09-30     NaN     NaN     NaN     NaN   0.667   0.496

代码

^{pr2}$

输出：

                 1       2       3       4       5       6
0
2011-05-31  68.433  81.696  75.083  66.659  68.898  72.622
2011-06-30   1.711   1.501   1.625   1.775   1.661     NaN
2011-07-31   0.422   0.264   0.204   0.234     NaN     NaN
2011-08-31   1.077   1.227   1.521     NaN     NaN     NaN
2011-09-30   0.667   0.496     NaN     NaN     NaN     NaN

让我们timeit

In [95]: %%timeit
   ....: roller = lambda (i, x): np.roll(x, -i)
   ....: row_terator = enumerate(df.values)
   ....: rolled = map(roller, row_terator)
   ....: result = DataFrame(np.vstack(rolled), index=df.index, columns=df.columns)
   ....:
10000 loops, best of 3: 101 us per loop

注意，np.roll是这里最重要的东西。它需要一个数组、一个整数位数的移位和一个axis参数，这样你就可以沿着它的任何一个轴移动ndarray。在

网友

3楼 · 编辑于 2024-09-26 17:52:12

df.apply(lambda x: x.shift(-x.notnull().argmax()), 1)

lambda函数查找第一个非空值的位置，并相应地移动行。这有两个问题：它没有利用已知的结构（上三角），因此可能会牺牲一些速度，而且，它可能会被数据中额外的nan欺骗。在

更新

一个更健壮的解决方案，使用itertools的计数器。在

^{pr2}$

正如预期的那样，这会快一点。在

In [47]: %timeit df.apply(lambda x: x.shift(-c.next() + 1), 1)
1000 loops, best of 3: 766 us per loop

In [49]: %timeit df.apply(lambda x: x.shift(-x.notnull().argmax()), 1)
1000 loops, best of 3: 1.08 ms per loop

相关问题更多 >

编程相关推荐

热门问题

热门文章