panda使用多索引和重叠索引级别来复制数据帧

3条回答

网友

1楼 · 编辑于 2024-06-01 14:45:44

请注意，我并不是声称这是做这个手术的正确方法，只是说这是一种方法。我自己以前也有过不知道正确的广播模式的问题。：-/

简短的说法是，我最终手动进行广播，并创建一个适当对齐的中间对象：

In [145]: R = B * A.loc[B.index.droplevel(2)].set_index(B.index)

In [146]: A.loc[("X", 2), "C"]
Out[146]: 0.5294149302910357

In [147]: A.loc[("X", 2), "C"] * B.loc[("X", 2, "c"), "C"]
Out[147]: 0.054262618238601339

In [148]: R.loc[("X", 2, "c"), "C"]
Out[148]: 0.054262618238601339

这是通过使用B的匹配部分索引到A，然后将索引设置为匹配。如果我再聪明一点，我就可以想出一种本地的方法来实现这一点，但我还没有。：-（

网友

2楼 · 编辑于 2024-06-01 14:45:44

我只需在较小形状的DF上使用^{}来匹配较大的DF's形状的索引，并向前填充其中的值。然后做乘法运算。在

B.multiply(A.reindex(B.index, method='ffill'))             # Or method='pad'

演示：

准备一些数据：

^{pr2}$

小DF:

>>> A

     A  B  C  D
X 1  0  1  0  0
  2  0  1  0  0
  3  0  1  0  0
Y 1  0  0  1  0
  2  1  1  1  0
  3  1  0  1  1

大DF：

>>> B 

      A  B  C  D
X 1 a  3  3  3  3
    b  3  3  2  2
    c  3  3  3  2
  2 a  3  2  2  2
    b  2  2  3  3
    c  3  3  3  2
  3 a  3  3  2  3
    b  2  3  2  3
    c  3  2  2  2
Y 1 a  2  2  2  2
    b  2  3  3  2
    c  3  3  3  3
  2 a  2  3  2  3
    b  3  3  2  3
    c  2  3  2  3
  3 a  2  2  3  2
    b  3  3  3  3
    c  3  3  3  3

在确保两者在所有级别上共享一个共同的索引轴后将它们相乘：

>>> B.multiply(A.reindex(B.index, method='ffill'))

       A  B  C  D
X 1 a  0  3  0  0
    b  0  3  0  0
    c  0  3  0  0
  2 a  0  2  0  0
    b  0  2  0  0
    c  0  3  0  0
  3 a  0  3  0  0
    b  0  3  0  0
    c  0  2  0  0
Y 1 a  0  0  2  0
    b  0  0  3  0
    c  0  0  3  0
  2 a  2  3  2  0
    b  3  3  2  0
    c  2  3  2  0
  3 a  2  0  3  2
    b  3  0  3  3
    c  3  0  3  3

现在您甚至可以在^{}中提供level参数，以便在这些匹配的索引处进行广播。在

网友

3楼 · 编辑于 2024-06-01 14:45:44

建议的方法

我们正在讨论broadcasting，因此我想在这里介绍{a1}。在

解决方案代码如下-

def numpy_broadcasting(df0, df1):
    m,n,r = map(len,df1.index.levels)
    a0 = df0.values.reshape(m,n,-1)
    a1 = df1.values.reshape(m,n,r,-1)
    out = (a1*a0[...,None,:]).reshape(-1,a1.shape[-1])
    df_out = pd.DataFrame(out, index=df1.index, columns=df1.columns)
    return df_out

基本思路：

1]将视图作为多维数组放入数据帧中。多维性是根据multindex数据帧的层次结构来维护的。因此，第一个数据帧有三个级别（包括列），第二个数据帧有四个级别。因此，我们有a0和{}对应于输入数据帧df0和{}，结果{}和{}分别具有{}和{}维。在

2）现在是广播部分。我们只需通过在第三个位置引入一个新轴，将a0扩展为4维。这个新轴将与df1中的第三个轴匹配。这允许我们执行元素乘法。在

3）最后，为了得到输出的multindex数据帧，我们只需对产品进行整形。在

样本运行：

1）输入数据帧-

^{pr2}$

2）输出数据帧-

In [371]: df_out
Out[371]: 
        A   B   C   D
0 0 0  12  12   2   6
    1   9   6   8  15
    2  24   2  14  12
  1 0  42  16   5   0
    1  48  48   7   0
    2   0  32   7   0
  2 0   3  20   2  10
    1   6  15   8   5
    2   0   0   5  35
1 0 0  56   0   3   7
    1   0   0   3   4
    2  35   0  21   4
  1 0  28   0   0   6
    1  28   0  24  48
    2  21   0   0  36
  2 0  16   0  35   0
    1   0   0  10   0
    2  14   0  30   0

标杆管理

In [31]: # Setup input dataframes of the same shape as stated in the question
    ...: individuals = list(range(2))
    ...: time = (0, 1, 2)
    ...: index = pd.MultiIndex.from_tuples(list(product(individuals, time)))
    ...: A = pd.DataFrame(data={'A': np.random.randint(0,9,6), \
    ...:                          'B': np.random.randint(0,9,6), \
    ...:                          'C': np.random.randint(0,9,6), \
    ...:                          'D': np.random.randint(0,9,6)
    ...:                          }, index=index)
    ...: 
    ...: 
    ...: individuals = list(range(2))
    ...: time = (0, 1, 2)
    ...: P = (0,1,2)
    ...: index = pd.MultiIndex.from_tuples(list(product(individuals, time, P)))
    ...: B = pd.DataFrame(data={'A': np.random.randint(0,9,18), \
    ...:                          'B': np.random.randint(0,9,18), \
    ...:                          'C': np.random.randint(0,9,18), \
    ...:                          'D': np.random.randint(0,9,18)}, index=index)
    ...: 

# @DSM's solution
In [32]: %timeit B * A.loc[B.index.droplevel(2)].set_index(B.index)
1 loops, best of 3: 8.75 ms per loop

# @Nickil Maveli's solution
In [33]: %timeit B.multiply(A.reindex(B.index, method='ffill'))
1000 loops, best of 3: 625 µs per loop

# @root's solution
In [34]: %timeit B * np.repeat(A.values, 3, axis=0)
1000 loops, best of 3: 487 µs per loop

In [35]: %timeit numpy_broadcasting(A, B)
1000 loops, best of 3: 191 µs per loop

建议的方法

标杆管理

相关问题更多 >

编程相关推荐

热门问题

热门文章