在多索引数据帧上使用多维索引？问题的回答

在多索引数据帧上使用多维索引？

回答此问题可获得 20 贡献值，回答如果被采纳可获得 50 分。

我有一个多索引数据帧，看起来像这样（称为p_z）： <pre><code> p_z entry subentry 0 0 0.338738 1 0.636035 2 -0.307365 3 -0.167779 4 0.243284 ... ... 26692 891 -0.459227 892 0.055993 893 -0.469857 894 0.192554 895 0.155738 [11742280 rows x 1 columns] </code></pre> 我希望能够基于另一个多维数据帧（或numpy数组）选择某些行。它看起来像熊猫数据帧（称为toffid）： <pre><code> tofpid entry subentry 0 0 0 1 2 2 4 3 5 4 7 ... ... 26692 193 649 194 670 195 690 196 725 197 737 [2006548 rows x 1 columns] </code></pre> 我还把它作为一个笨拙的数组，它是一个（26692，）数组（每个条目都有一个非标准数量的子条目）。这是一个选择df/数组，告诉p_z df保留哪些行。所以在p_z的条目0中，它应该保留子条目0、2、4、5、7等等 我找不到办法在熊猫身上做到这一点。我对熊猫还不熟悉，对多索引更是陌生；但我觉得应该有办法做到这一点。如果它能够像我一样更好地进行广播，我将在1500个类似大小的数据帧上进行广播。如果有帮助的话，这些数据帧来自使用Outlot导入的*.root文件（如果没有pandas，还有其他方法可以做到这一点，我会接受；但我希望使用pandas使事情井然有序） 编辑：这里有一个可复制的例子（由吉姆·帕文斯基的回答提供；谢谢！） <pre><code>import awkward as ak import pandas as pd >>> p_z = ak.Array([[ 0.338738, 0.636035, -0.307365, -0.167779, 0.243284, 0.338738, 0.636035], [-0.459227, 0.055993, -0.469857, 0.192554, 0.155738, -0.459227]]) >>> p_z = ak.to_pandas(p_z) >>> tofpid = ak.Array([[0, 2, 4, 5], [1, 2, 4]]) >>> tofpid = ak.to_pandas(tofpid) </code></pre> 这两个数据帧都是在Outlot中本机生成的，但这将复制与Outlot相同的数据帧（使用笨拙的库）

0 条评论
分类：Python问答

默认排序时间排序

1 个回答

匿名 1天前

　擅长：python、mysql、java

下面是一个可复制的示例，其结构足以表示问题（使用<a href="https://pypi.org/project/awkward/" rel="nofollow noreferrer">awkward</a>库）： <pre class="lang-py prettyprint-override"><code>>>> import awkward as ak >>> >>> p_z = ak.Array([ ... [ 0.338738, 0.636035, -0.307365, -0.167779, 0.243284, 0.338738, 0.636035], ... [-0.459227, 0.055993, -0.469857, 0.192554, 0.155738, -0.459227], ... ]) >>> p_z <Array [[0.339, 0.636, ... 0.156, -0.459]] type='2 * var * float64'> >>> >>> tofpid = ak.Array([[0, 2, 4, 5], [1, 2, 4]]) >>> tofpid <Array [[0, 2, 4, 5], [1, 2, 4]] type='2 * var * int64'> </code></pre> 以熊猫的形式，这是： <pre class="lang-py prettyprint-override"><code>>>> df_p_z = ak.to_pandas(p_z) >>> df_p_z values entry subentry 0 0 0.338738 1 0.636035 2 -0.307365 3 -0.167779 4 0.243284 5 0.338738 6 0.636035 1 0 -0.459227 1 0.055993 2 -0.469857 3 0.192554 4 0.155738 5 -0.459227 >>> df_tofpid = ak.to_pandas(tofpid) >>> df_tofpid values entry subentry 0 0 0 1 2 2 4 3 5 1 0 1 1 2 2 4 </code></pre> 作为一个笨拙的数组，您要做的是<a href="https://awkward-array.readthedocs.io/en/latest/_auto/ak.Array.html#ak-array-getitem" rel="nofollow noreferrer">slice the first array by the second</a>。也就是说，您需要<code>p_z[tofpid]</code>： <pre class="lang-py prettyprint-override"><code>>>> p_z[tofpid] <Array [[0.339, -0.307, ... -0.47, 0.156]] type='2 * var * float64'> >>> p_z[tofpid].tolist() [[0.338738, -0.307365, 0.243284, 0.338738], [0.055993, -0.469857, 0.155738]] </code></pre> 使用熊猫，我成功地做到了这一点： <pre class="lang-py prettyprint-override"><code>>>> df_p_z.loc[df_tofpid.reset_index(level=0).apply(lambda x: tuple(x.values), axis=1).tolist()] values entry subentry 0 0 0.338738 2 -0.307365 4 0.243284 5 0.338738 1 1 0.055993 2 -0.469857 4 0.155738 </code></pre> 这里发生的事情是<code>df_tofpid.reset_index(level=0)</code>将多索引的<code>"entry"</code>部分转换为一列，然后<code>apply</code>对每一行执行一个Python函数，如果<code>axis=1</code>，每一行都是<code>x.values</code>，并且<code>tolist()</code>将结果转换为一个元组列表，如 <pre class="lang-py prettyprint-override"><code>>>> df_tofpid.reset_index(level=0).apply(lambda x: tuple(x.values), axis=1).tolist() [(0, 0), (0, 2), (0, 4), (0, 5), (1, 1), (1, 2), (1, 4)] </code></pre> 这就是<code>loc</code>需要从其多索引中选择条目/子条目对的内容 我的Pandas解决方案有两个缺点：它很复杂，需要经过Python迭代和对象，不能像数组那样扩展熊猫专家很有可能找到比我更好的解决方案<我对熊猫有很多不了解

在多索引数据帧上使用多维索引？

1 个回答

相关Python问题