将两个不同数据帧中每行的值相乘

3条回答

网友

1楼 · 编辑于 2024-09-27 07:29:00

我会用numpy广播一次性完成这一切。。。在

train_ = pd.DataFrame(
    (train.values * pop.values[:, None]).reshape(-1, train.shape[1]),
    pd.MultiIndex.from_product([pop.index, train.index]),
    train.columns
)

train_

      feature0   feature1   feature2   feature3   feature4   feature5
0 0  18.279579  -3.921346   0.000000  -0.000000  -0.000000 -18.265003
  1  17.899545 -15.503942  -0.000000  -0.000000  -0.000000   4.398419
  4  16.432750 -22.490190  -0.000000  -0.000000  -0.000000  -2.433374
  5  15.905368  -4.812785   0.000000   0.000000   0.000000  -1.074326
  6  16.991823 -15.946251   0.000000   0.000000   0.000000  -1.482333
1 0   0.000000  -3.921346   0.000000  -7.250185  -0.000000  -0.000000
  1   0.000000 -15.503942  -0.000000  -0.053619  -0.000000   0.000000
  4   0.000000 -22.490190  -0.000000 -15.247781  -0.000000  -0.000000
  5   0.000000  -4.812785   0.000000   3.742221   0.000000  -0.000000
  6   0.000000 -15.946251   0.000000   8.057511   0.000000  -0.000000
2 0   0.000000  -0.000000   0.000000  -0.000000  -0.000000 -18.265003
  1   0.000000  -0.000000  -0.000000  -0.000000  -0.000000   4.398419
  4   0.000000  -0.000000  -0.000000  -0.000000  -0.000000  -2.433374
  5   0.000000  -0.000000   0.000000   0.000000   0.000000  -1.074326
  6   0.000000  -0.000000   0.000000   0.000000   0.000000  -1.482333
3 0   0.000000  -0.000000  13.611829  -0.000000 -11.773605 -18.265003
  1   0.000000  -0.000000  -0.741729  -0.000000  -6.734652   4.398419
  4   0.000000  -0.000000  -4.611659  -0.000000 -13.941488  -2.433374
  5   0.000000  -0.000000  18.291712   0.000000   3.631887  -1.074326
  6   0.000000  -0.000000   8.299577   0.000000   8.057510  -1.482333

您可以只访问与第i行相对应的行或使用train_.loc[i]访问{}

^{pr2}$

粗糙时间测试
我懒得做更健壮的测试

%%timeit
pd.DataFrame(
    (train.values * pop.values[:, None]).reshape(-1, train.shape[1]),
    pd.MultiIndex.from_product([pop.index, train.index]),
    train.columns
)

%%timeit
res = pop.iloc[np.repeat(np.arange(len(pop)), len(train))]
res = res.set_index(np.tile(train.index, len(pop)), append=True).add_prefix('feature')
res.mul(train, level=1)

%%timeit
pd.concat([train * pop.values[i] for i in range(pop.shape[0])],
               keys=pop.index.tolist())

571 µs ± 10.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.42 ms ± 18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
1.7 ms ± 69.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

网友

2楼 · 编辑于 2024-09-27 07:29:00

如果需要循环（如果数据很大，则速度较慢）：

for i, x in population.iterrows():
    print (train * x.values)

    feature0   feature1  feature2  feature3  feature4   feature5
0  18.279579  -3.921346       0.0      -0.0      -0.0 -18.265003
1  17.899545 -15.503942      -0.0      -0.0      -0.0   4.398419
4  16.432750 -22.490190      -0.0      -0.0      -0.0  -2.433374
5  15.905368  -4.812785       0.0       0.0       0.0  -1.074326
6  16.991823 -15.946251       0.0       0.0       0.0  -1.482333
   feature0   feature1  feature2   feature3  feature4  feature5
0       0.0  -3.921346       0.0  -7.250185      -0.0      -0.0
1       0.0 -15.503942      -0.0  -0.053619      -0.0       0.0
4       0.0 -22.490190      -0.0 -15.247781      -0.0      -0.0
5       0.0  -4.812785       0.0   3.742221       0.0      -0.0
6       0.0 -15.946251       0.0   8.057511       0.0      -0.0
   feature0  feature1  feature2  feature3  feature4   feature5
0       0.0      -0.0       0.0      -0.0      -0.0 -18.265003
1       0.0      -0.0      -0.0      -0.0      -0.0   4.398419
4       0.0      -0.0      -0.0      -0.0      -0.0  -2.433374
5       0.0      -0.0       0.0       0.0       0.0  -1.074326
6       0.0      -0.0       0.0       0.0       0.0  -1.482333
   feature0  feature1   feature2  feature3   feature4   feature5
0       0.0      -0.0  13.611829      -0.0 -11.773605 -18.265003
1       0.0      -0.0  -0.741729      -0.0  -6.734652   4.398419
4       0.0      -0.0  -4.611659      -0.0 -13.941488  -2.433374
5       0.0      -0.0  18.291712       0.0   3.631887  -1.074326
6       0.0      -0.0   8.299577       0.0   8.057510  -1.482333

或每行分开：

^{pr2}$

或对于多索引数据帧：

d = pd.concat([train * population.values[i] for i in range(population.shape[0])],
               keys=population.index.tolist())
print (d)

      feature0   feature1   feature2   feature3   feature4   feature5
0 0  18.279579  -3.921346   0.000000  -0.000000  -0.000000 -18.265003
  1  17.899545 -15.503942  -0.000000  -0.000000  -0.000000   4.398419
  4  16.432750 -22.490190  -0.000000  -0.000000  -0.000000  -2.433374
  5  15.905368  -4.812785   0.000000   0.000000   0.000000  -1.074326
  6  16.991823 -15.946251   0.000000   0.000000   0.000000  -1.482333
1 0   0.000000  -3.921346   0.000000  -7.250185  -0.000000  -0.000000
  1   0.000000 -15.503942  -0.000000  -0.053619  -0.000000   0.000000
  4   0.000000 -22.490190  -0.000000 -15.247781  -0.000000  -0.000000
  5   0.000000  -4.812785   0.000000   3.742221   0.000000  -0.000000
  6   0.000000 -15.946251   0.000000   8.057511   0.000000  -0.000000
2 0   0.000000  -0.000000   0.000000  -0.000000  -0.000000 -18.265003
  1   0.000000  -0.000000  -0.000000  -0.000000  -0.000000   4.398419
  4   0.000000  -0.000000  -0.000000  -0.000000  -0.000000  -2.433374
  5   0.000000  -0.000000   0.000000   0.000000   0.000000  -1.074326
  6   0.000000  -0.000000   0.000000   0.000000   0.000000  -1.482333
3 0   0.000000  -0.000000  13.611829  -0.000000 -11.773605 -18.265003
  1   0.000000  -0.000000  -0.741729  -0.000000  -6.734652   4.398419
  4   0.000000  -0.000000  -4.611659  -0.000000 -13.941488  -2.433374
  5   0.000000  -0.000000  18.291712   0.000000   3.631887  -1.074326
  6   0.000000  -0.000000   8.299577   0.000000   8.057510  -1.482333

然后按^{}选择：

print (d.xs(0))

    feature0   feature1  feature2  feature3  feature4   feature5
0  18.279579  -3.921346       0.0      -0.0      -0.0 -18.265003
1  17.899545 -15.503942      -0.0      -0.0      -0.0   4.398419
4  16.432750 -22.490190      -0.0      -0.0      -0.0  -2.433374
5  15.905368  -4.812785       0.0       0.0       0.0  -1.074326
6  16.991823 -15.946251       0.0       0.0       0.0  -1.482333

网友

3楼 · 编辑于 2024-09-27 07:29:00

一旦将population的列设置为与train匹配，就可以使用*：

In [11]: population.columns = train.columns

In [12]: train * population.iloc[0]
Out[12]:
    feature0   feature1  feature2  feature3  feature4   feature5
0  18.279579  -3.921346       0.0      -0.0      -0.0 -18.265003
1  17.899545 -15.503942      -0.0      -0.0      -0.0   4.398419
4  16.432750 -22.490190      -0.0      -0.0      -0.0  -2.433374
5  15.905368  -4.812785       0.0       0.0       0.0  -1.074326
6  16.991823 -15.946251       0.0       0.0       0.0  -1.482333

您可以使用np.tile和np.repeat非常有效地创建多索引（如@jezrael所建议）：

^{pr2}$

相关问题更多 >

编程相关推荐

热门问题

热门文章

将两个不同数据帧中每行的值相乘

相关问题 更多 >

编程相关推荐

热门问题

热门文章

相关问题更多 >