使用5层组合查找数据帧的子集包含特定的5层组合,每列标识5个参与者中的一个

2024-10-02 04:32:41 发布

您现在位置:Python中文网/ 问答频道 /正文

抱歉,如果标题是误导性的,因为我不知道如何最好地解释我试图做什么

我正在使用本赛季整个联盟的NBA逐场数据,试图找到特定防守阵容的相对防守评级。在数据帧(df)中,每个进攻球员、每个防守球员、所有物和点数都有一列(还有很多,但我只关心这些),所以总共有10列

如果我过滤一个特定的防御组合,我有一个较小的数据帧(df2),这是唯一的信息,当防御单位在地板上。我已经走了这么远,但我现在想做的是采取所有的进攻球员组合,这个阵容已经面临的,并在df过滤信息

下面是一个小得多的例子,说明df2可能是什么样子:

   offplayer1  offplayer2  offplayer3  offplayer4  offplayer5  defplayer1  defplayer2  defplayer3  defplayer4  defplayer5  possessions  points  
0           1           2           3           4           5          11          12          13          14          15            5       5 
1           1           2           3           4           6          11          12          13          14          15            4       4  
2           2           3           4           5           6          11          12          13          14          15            3       5  

从这一点开始,我想在df2中使用offplayer1-5的所有组合,并将其用作df中的过滤器

有什么好办法吗

编辑:下面是生成上述df2的代码,如果您想演示的话,还需要一个示例df

df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5],[1,2,3,4,5,11,12,13,14,16,5,5],[1,2,3,4,5,21,22,23,24,25,10,10],[11,12,13,14,15,21,22,23,24,25,5,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])

df2 = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],[1,2,3,4,6,11,12,13,14,15,4,4],[2,3,4,5,6,11,12,13,14,15,3,5]]),columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5','defplayer1','defplayer2','defplayer3','defplayer4','defplayer5','possessions','points'])

Tags: 数据dfdf2球员possessionsoffplayer1offplayer3offplayer5
1条回答
网友
1楼 · 发布于 2024-10-02 04:32:41

如果我理解正确的话,您应该能够基于offplayer列为每个df创建一个新索引,然后set_index,并将布尔索引与.isin一起使用。我稍微修改了一下你的样品,让你看看

# modified your sample data a little

df = pd.DataFrame(np.array([[1,2,3,4,5,11,12,13,14,15,5,5],
                            [1,2,3,4,6,11,12,13,14,15,4,4],
                            [1,2,3,4,5,11,12,13,14,16,3,5],
                            [2,3,4,5,6,11,12,13,14,15,5,5], 
                            [1,2,3,4,5,11,12,13,14,17,5,5],
                            [1,2,3,4,7,11,12,13,14,17,5,5]]),
                  columns=['offplayer1','offplayer2','offplayer3','offplayer4','offplayer5',
                           'defplayer1','defplayer2','defplayer3','defplayer4','defplayer5',
                           'possessions','points'])

# def players your are looking for
defplayers = [11,12,13,14,15]

# create df2 through boolean indexing
df2 = df[df[df.columns[5:10]].isin(defplayers).all(1)]

# create new indices
df_idx = df.columns[:5].values.tolist()
df2_idx = df2.columns[:5].values.tolist()

# boolean indexing to filter df
df[df.set_index(df_idx).index.isin(df2.set_index(df2_idx).index)]

相关问题 更多 >

    热门问题