<p>既然您说的是“pandas(或numpy)”,那么让我建议<a href="http://xarray.pydata.org/en/stable/" rel="nofollow noreferrer">^{<cd1>}</a>,这是PyData对N-D标记数组的回答。xarray是专门为涉及具有多个相应维度的标记阵列的问题而设计的,例如,您希望通过numpy阵列的性能和广播行为获得pandas索引的所有便利</p>
<p>对于这个问题,您可以将数据帧转储到xarray<a href="http://xarray.pydata.org/en/stable/user-guide/data-structures.html#dataarray" rel="nofollow noreferrer">DataArrays</a>:</p>
<pre class="lang-py prettyprint-override"><code>In [1]: from pandas import DataFrame, isna
...: from numpy import nan
...:
...:
...: df = DataFrame([
...: {'id': '1', 'x': 2, 'y': 3, 'z': 4},
...: {'id': '5', 'x': 6, 'y': 7, 'z': 8},
...: {'id': '9', 'x': 10, 'y': 11, 'z': 12}
...: ]).set_index('id')
...:
...: factors = DataFrame([
...: {'id': '5', 'x': nan, 'z': 3},
...: {'id': '9', 'x': 0.2, 'z': nan},
...: ]).set_index('id')
In [2]: da = df.stack().to_xarray()
In [3]: da
Out[3]:
<xarray.DataArray (id: 3, level_1: 3)>
array([[ 2, 3, 4],
[ 6, 7, 8],
[10, 11, 12]])
Coordinates:
* id (id) object '1' '5' '9'
* level_1 (level_1) object 'x' 'y' 'z'
In [4]: factors_da = factors.stack().to_xarray()
In [5]: factors_da
Out[5]:
<xarray.DataArray (id: 2, level_1: 2)>
array([[nan, 3. ],
[0.2, nan]])
Coordinates:
* id (id) object '5' '9'
* level_1 (level_1) object 'x' 'z'
</code></pre>
<p>然后,您可以广播<code>factors_da</code>以像<code>da</code>一样索引:</p>
<pre class="lang-py prettyprint-override"><code>In [6]: factors_da = factors_da.reindex_like(da)
...: factors_da
Out[6]:
<xarray.DataArray (id: 3, level_1: 3)>
array([[nan, nan, nan],
[nan, nan, 3. ],
[0.2, nan, nan]])
Coordinates:
* id (id) object '1' '5' '9'
* level_1 (level_1) object 'x' 'y' 'z'
</code></pre>
<p>然后将它们相乘(第一个填充因子为1,因此保留缺失值):</p>
<pre class="lang-py prettyprint-override"><code>In [7]: da * factors_da.fillna(1)
Out[7]:
<xarray.DataArray (id: 3, level_1: 3)>
array([[ 2., 3., 4.],
[ 6., 7., 24.],
[ 2., 11., 12.]])
Coordinates:
* id (id) object '1' '5' '9'
* level_1 (level_1) object 'x' 'y' 'z'
</code></pre>
<p>然后你就可以把这些东西倒回到熊猫身上:</p>
<pre class="lang-py prettyprint-override"><code>In [9]: (da * factors_da.fillna(1)).to_series().unstack('level_1')
Out[9]:
level_1 x y z
id
1 2.0 3.0 4.0
5 6.0 7.0 24.0
9 2.0 11.0 12.0
</code></pre>