在groupby之后添加一列来唯一地标识每个组

+--------+-------+--------+--------+--------+------------------+------+-------+------+--------------------+-------+---------+ | Id | s_lat | s_lng | d_lat | d_lng | TT | T | Esti2 | Est1 | time_diff | diff | Cluster | +--------+-------+--------+--------+--------+------------------+------+-------+------+--------------------+-------+---------+ | 67607 | 63.42 | 10.387 | 63.425 | 10.441 | 10.2 | 4.33 | 11 | 4.4 | -0.800000000000001 | -0.07 | 0 | | 70720 | 63.42 | 10.387 | 63.425 | 10.441 | 9.03333333333333 | 4.36 | 11 | 4.4 | -1.96666666666667 | -0.04 | 0 | | 68394 | 63.42 | 10.387 | 63.436 | 10.399 | 15.1833333333333 | 2.66 | 10 | 2.7 | 5.18333333333333 | -0.04 | 0 | | 67340 | 63.42 | 10.387 | 63.436 | 10.399 | 8.91666666666667 | 2.44 | 10 | 2.7 | -1.08333333333333 | -0.26 | 0 | | 72363 | 63.42 | 10.387 | 63.436 | 10.399 | 9.91666666666667 | 2.47 | 10 | 2.7 | -0.083333333333334 | -0.23 | 0 | | 70401 | 63.42 | 10.387 | 63.436 | 10.399 | 7.85 | 2.67 | 10 | 2.7 | -2.15 | -0.03 | 0 | | 70695 | 63.42 | 10.387 | 63.436 | 10.399 | 11.6166666666667 | 3.24 | 10 | 2.7 | 1.61666666666667 | 0.54 | 0 | | 69698 | 63.42 | 10.387 | 63.436 | 10.399 | 8.91666666666667 | 2.47 | 10 | 2.7 | -1.08333333333333 | -0.23 | 0 | | 70793 | 63.42 | 10.387 | 63.436 | 10.399 | 11.85 | 2.52 | 10 | 2.7 | 1.85 | -0.18 | 0 | | 67150 | 63.42 | 10.387 | 63.411 | 10.402 | 4.01666666666667 | 1.68 | 6 | 1.7 | -1.98333333333333 | -0.02 | 0 | | 69934 | 63.42 | 10.387 | 63.411 | 10.402 | 4.56666666666667 | 1.69 | 6 | 1.7 | -1.43333333333333 | -0.01 | 0 | +--------+-------+--------+--------+--------+------------------+------+-------+------+--------------------+-------+---------+

+--------+-------+--------+--------+--------+------------------+------+-------+------+--------------------+-------+---------+------------------+ | TourId | s_lat | s_lng | d_lat | d_lng | TT | T | Esti2 | Est1 | time_diff | diff | Cluster | Similarity_index | +--------+-------+--------+--------+--------+------------------+------+-------+------+--------------------+-------+---------+------------------+ | 67607 | 63.42 | 10.387 | 63.425 | 10.441 | 10.2 | 4.33 | 11 | 4.4 | -0.800000000000001 | -0.07 | 0 | A | | 70720 | 63.42 | 10.387 | 63.425 | 10.441 | 9.03333333333333 | 4.36 | 11 | 4.4 | -1.96666666666667 | -0.04 | 0 | A | | 68394 | 63.42 | 10.387 | 63.436 | 10.399 | 15.1833333333333 | 2.66 | 10 | 2.7 | 5.18333333333333 | -0.04 | 0 | B | | 67340 | 63.42 | 10.387 | 63.436 | 10.399 | 8.91666666666667 | 2.44 | 10 | 2.7 | -1.08333333333333 | -0.26 | 0 | B | | 72363 | 63.42 | 10.387 | 63.436 | 10.399 | 9.91666666666667 | 2.47 | 10 | 2.7 | -0.083333333333334 | -0.23 | 0 | B | | 70401 | 63.42 | 10.387 | 63.436 | 10.399 | 7.85 | 2.67 | 10 | 2.7 | -2.15 | -0.03 | 0 | B | | 70695 | 63.42 | 10.387 | 63.436 | 10.399 | 11.6166666666667 | 3.24 | 10 | 2.7 | 1.61666666666667 | 0.54 | 0 | B | | 69698 | 63.42 | 10.387 | 63.436 | 10.399 | 8.91666666666667 | 2.47 | 10 | 2.7 | -1.08333333333333 | -0.23 | 0 | B | | 70793 | 63.42 | 10.387 | 63.436 | 10.399 | 11.85 | 2.52 | 10 | 2.7 | 1.85 | -0.18 | 0 | B | | 67150 | 63.42 | 10.387 | 63.411 | 10.402 | 4.01666666666667 | 1.68 | 6 | 1.7 | -1.98333333333333 | -0.02 | 0 | C | | 69934 | 63.42 | 10.387 | 63.411 | 10.402 | 4.56666666666667 | 1.69 | 6 | 1.7 | -1.43333333333333 | -0.01 | 0 | C | +--------+-------+--------+--------+--------+------------------+------+-------+------+--------------------+-------+---------+------------------+

1条回答

网友

1楼 · 发布于 2024-10-03 15:27:23

你需要这个

import string
d = dict(enumerate(list(string.ascii_uppercase))) # will give 26 unique values
df['S_I']=df.groupby(['s_lat','s_lng','d_lat','d_lng'], sort = False).ngroup().map(d)

如果可以有26个以上的唯一组，那么可以使用以下代码来生成列表d

from itertools import product
combs= [''.join(i) for i in product(string.ascii_uppercase, repeat = 2)]
d=dict(enumerate(combs))

#Change the number for 'repeat' as needed.
 with 2, you get 676 unique combinations like "'AA','AB','AC'..."
 with 3, you get 17576 unique combinations like "'AAA','AAB','AAC'..."

输出

Id  s_lat   s_lng   d_lat   d_lng   TT  T   Esti2   Est1    time_diff   diff    Cluster     S_I
0   67607   63.42   10.387  63.425  10.441  10.200000   4.33    11  4.4     -0.800000   -0.07   0   A
1   70720   63.42   10.387  63.425  10.441  9.033333    4.36    11  4.4     -1.966667   -0.04   0   A
2   68394   63.42   10.387  63.436  10.399  15.183333   2.66    10  2.7     5.183333    -0.04   0   B
3   67340   63.42   10.387  63.436  10.399  8.916667    2.44    10  2.7     -1.083333   -0.26   0   B
4   72363   63.42   10.387  63.436  10.399  9.916667    2.47    10  2.7     -0.083333   -0.23   0   B
5   70401   63.42   10.387  63.436  10.399  7.850000    2.67    10  2.7     -2.150000   -0.03   0   B
6   70695   63.42   10.387  63.436  10.399  11.616667   3.24    10  2.7     1.616667    0.54    0   B
7   69698   63.42   10.387  63.436  10.399  8.916667    2.47    10  2.7     -1.083333   -0.23   0   B
8   70793   63.42   10.387  63.436  10.399  11.850000   2.52    10  2.7     1.850000    -0.18   0   B
9   67150   63.42   10.387  63.411  10.402  4.016667    1.68    6   1.7     -1.983333   -0.02   0   C
10  69934   63.42   10.387  63.411  10.402  4.566667    1.69    6   1.7     -1.433333   -0.01   0   C

相关问题更多 >

编程相关推荐

热门问题

热门文章