给定组中所有项的笛卡尔积

id tof 0 43.0 1999991.0 1 43.0 2095230.0 2 43.0 4123105.0 3 43.0 5560423.0 4 46.0 2098996.0 5 46.0 2114971.0 6 46.0 4130033.0 7 46.0 4355096.0 8 82.0 2055207.0 9 82.0 2093996.0 10 82.0 4193587.0 11 90.0 2059360.0 12 90.0 2083762.0 13 90.0 2648235.0 14 90.0 4212177.0 15 103.0 1993306.0 . . .

[(1993306.0, 2105441.0), (1993306.0, 3972679.0), (1993306.0, 3992558.0), (1993306.0, 4009044.0), (2105441.0, 3972679.0), (2105441.0, 3992558.0), (2105441.0, 4009044.0), (3972679.0, 3992558.0), (3972679.0, 4009044.0), (3992558.0, 4009044.0),...]

3条回答

网友

1楼 · 编辑于 2024-09-29 01:24:57

IIUC公司：

from itertools import combinations

pd.DataFrame([
    [k, c0, c1] for k, tof in df.groupby('id').tof
           for c0, c1 in combinations(tof, 2)
], columns=['id', 'tof0', 'tof1'])

      id       tof0       tof1
0   43.0  1999991.0  2095230.0
1   43.0  1999991.0  4123105.0
2   43.0  1999991.0  5560423.0
3   43.0  2095230.0  4123105.0
4   43.0  2095230.0  5560423.0
5   43.0  4123105.0  5560423.0
6   46.0  2098996.0  2114971.0
7   46.0  2098996.0  4130033.0
8   46.0  2098996.0  4355096.0
9   46.0  2114971.0  4130033.0
10  46.0  2114971.0  4355096.0
11  46.0  4130033.0  4355096.0
12  82.0  2055207.0  2093996.0
13  82.0  2055207.0  4193587.0
14  82.0  2093996.0  4193587.0
15  90.0  2059360.0  2083762.0
16  90.0  2059360.0  2648235.0
17  90.0  2059360.0  4212177.0
18  90.0  2083762.0  2648235.0
19  90.0  2083762.0  4212177.0
20  90.0  2648235.0  4212177.0

说明

这是一个列表理解，返回由数据帧构造函数包装的列表列表。Look up comprehensions to understand better.

from itertools import combinations

pd.DataFrame([
    #            name   series of tof values
    #               ↓   ↓    
    [k, c0, c1] for k, tof in df.groupby('id').tof
    #    items from combinations
    #      first    second
    #          ↓    ↓
           for c0, c1 in combinations(tof, 2)
], columns=['id', 'tof0', 'tof1'])

网友

2楼 · 编辑于 2024-09-29 01:24:57

Groupby确实有效：

def get_product(x):
    return pd.MultiIndex.from_product((x.tof, x.tof)).values

for i, g in df.groupby('id'):
    print(i, get_product(g))

输出：

43.0 [(1999991.0, 1999991.0) (1999991.0, 2095230.0) (1999991.0, 4123105.0)
 (1999991.0, 5560423.0) (2095230.0, 1999991.0) (2095230.0, 2095230.0)
 (2095230.0, 4123105.0) (2095230.0, 5560423.0) (4123105.0, 1999991.0)
 (4123105.0, 2095230.0) (4123105.0, 4123105.0) (4123105.0, 5560423.0)
 (5560423.0, 1999991.0) (5560423.0, 2095230.0) (5560423.0, 4123105.0)
 (5560423.0, 5560423.0)]
46.0 [(2098996.0, 2098996.0) (2098996.0, 2114971.0) (2098996.0, 4130033.0)
 (2098996.0, 4355096.0) (2114971.0, 2098996.0) (2114971.0, 2114971.0)
 (2114971.0, 4130033.0) (2114971.0, 4355096.0) (4130033.0, 2098996.0)
 (4130033.0, 2114971.0) (4130033.0, 4130033.0) (4130033.0, 4355096.0)
 (4355096.0, 2098996.0) (4355096.0, 2114971.0) (4355096.0, 4130033.0)
 (4355096.0, 4355096.0)]
82.0 [(2055207.0, 2055207.0) (2055207.0, 2093996.0) (2055207.0, 4193587.0)
 (2093996.0, 2055207.0) (2093996.0, 2093996.0) (2093996.0, 4193587.0)
 (4193587.0, 2055207.0) (4193587.0, 2093996.0) (4193587.0, 4193587.0)]
90.0 [(2059360.0, 2059360.0) (2059360.0, 2083762.0) (2059360.0, 2648235.0)
 (2059360.0, 4212177.0) (2083762.0, 2059360.0) (2083762.0, 2083762.0)
 (2083762.0, 2648235.0) (2083762.0, 4212177.0) (2648235.0, 2059360.0)
 (2648235.0, 2083762.0) (2648235.0, 2648235.0) (2648235.0, 4212177.0)
 (4212177.0, 2059360.0) (4212177.0, 2083762.0) (4212177.0, 2648235.0)
 (4212177.0, 4212177.0)]
103.0 [(1993306.0, 1993306.0)]

网友

3楼 · 编辑于 2024-09-29 01:24:57

from itertools import product
x = df[df.id == 13].tof.values.astype(float)
all_combinations = list(product(x,x))

如果希望元素不重复，可以使用

from itertools import combinations
x = df[df.id == 13].tof.values.astype(float)
all_combinations = list(combinations(x,2))

说明

相关问题更多 >

编程相关推荐

热门问题

热门文章