<p>下面是一个可复制的示例,其结构足以表示问题(使用<a href="https://pypi.org/project/awkward/" rel="nofollow noreferrer">awkward</a>库):</p>
<pre class="lang-py prettyprint-override"><code>>>> import awkward as ak
>>>
>>> p_z = ak.Array([
... [ 0.338738, 0.636035, -0.307365, -0.167779, 0.243284, 0.338738, 0.636035],
... [-0.459227, 0.055993, -0.469857, 0.192554, 0.155738, -0.459227],
... ])
>>> p_z
<Array [[0.339, 0.636, ... 0.156, -0.459]] type='2 * var * float64'>
>>>
>>> tofpid = ak.Array([[0, 2, 4, 5], [1, 2, 4]])
>>> tofpid
<Array [[0, 2, 4, 5], [1, 2, 4]] type='2 * var * int64'>
</code></pre>
<p>以熊猫的形式,这是:</p>
<pre class="lang-py prettyprint-override"><code>>>> df_p_z = ak.to_pandas(p_z)
>>> df_p_z
values
entry subentry
0 0 0.338738
1 0.636035
2 -0.307365
3 -0.167779
4 0.243284
5 0.338738
6 0.636035
1 0 -0.459227
1 0.055993
2 -0.469857
3 0.192554
4 0.155738
5 -0.459227
>>> df_tofpid = ak.to_pandas(tofpid)
>>> df_tofpid
values
entry subentry
0 0 0
1 2
2 4
3 5
1 0 1
1 2
2 4
</code></pre>
<p>作为一个笨拙的数组,您要做的是<a href="https://awkward-array.readthedocs.io/en/latest/_auto/ak.Array.html#ak-array-getitem" rel="nofollow noreferrer">slice the first array by the second</a>。也就是说,您需要<code>p_z[tofpid]</code>:</p>
<pre class="lang-py prettyprint-override"><code>>>> p_z[tofpid]
<Array [[0.339, -0.307, ... -0.47, 0.156]] type='2 * var * float64'>
>>> p_z[tofpid].tolist()
[[0.338738, -0.307365, 0.243284, 0.338738], [0.055993, -0.469857, 0.155738]]
</code></pre>
<p>使用熊猫,我成功地做到了这一点:</p>
<pre class="lang-py prettyprint-override"><code>>>> df_p_z.loc[df_tofpid.reset_index(level=0).apply(lambda x: tuple(x.values), axis=1).tolist()]
values
entry subentry
0 0 0.338738
2 -0.307365
4 0.243284
5 0.338738
1 1 0.055993
2 -0.469857
4 0.155738
</code></pre>
<p>这里发生的事情是<code>df_tofpid.reset_index(level=0)</code>将多索引的<code>"entry"</code>部分转换为一列,然后<code>apply</code>对每一行执行一个Python函数,如果<code>axis=1</code>,每一行都是<code>x.values</code>,并且<code>tolist()</code>将结果转换为一个元组列表,如</p>
<pre class="lang-py prettyprint-override"><code>>>> df_tofpid.reset_index(level=0).apply(lambda x: tuple(x.values), axis=1).tolist()
[(0, 0), (0, 2), (0, 4), (0, 5), (1, 1), (1, 2), (1, 4)]
</code></pre>
<p>这就是<code>loc</code>需要从其多索引中选择条目/子条目对的内容</p>
<p>我的Pandas解决方案有两个缺点:它很复杂,需要经过Python迭代和对象,不能像数组那样扩展<em>熊猫专家很有可能找到比我更好的解决方案<我对熊猫有很多不了解</p>