通过比较相同的行是否出现在数据帧中的不同组中，并分配相对值，在数据帧中创建一个新列

df = {'group':['a','a','a','a', 'b','b','b','b','b','b','b','b','b', 'c','c','c','c','c', 'd','d','d','d','d','d','d', 'e','e','e','e',], 'date':[np.datetime64('2020-01-01'),np.datetime64('2020-01-01'),np.datetime64('2020-01-01'),np.datetime64('2020-01-01'), np.datetime64('2019-03-12'),np.datetime64('2019-03-12'),np.datetime64('2019-03-12'),np.datetime64('2019-03-12'), np.datetime64('2019-03-12'),np.datetime64('2019-03-12'),np.datetime64('2019-03-12'),np.datetime64('2019-03-12'), np.datetime64('2019-03-12'), np.datetime64('2020-01-01'),np.datetime64('2020-01-01'),np.datetime64('2020-01-01'),np.datetime64('2020-01-01'), np.datetime64('2020-01-01'), np.datetime64('2019-01-17'),np.datetime64('2019-01-17'),np.datetime64('2019-01-17'),np.datetime64('2019-01-17'), np.datetime64('2019-01-17'),np.datetime64('2019-01-17'),np.datetime64('2019-01-17'), np.datetime64('2018-12-03'),np.datetime64('2018-12-03'),np.datetime64('2018-12-03'),np.datetime64('2018-12-03')], 'id':['tom','taliha','alyssa','randyl', 'tom','taliha','edward','aaron','daniel','jean','sigmund','albus','riddle', 'fellicia','ron','fred','george','alex', 'taliha','alyssa','locke','jon','jamie','sam','sydney', 'jon','jamie','sam','arya'], 'value':[1,2,3,4, 7,6,4,8,2,3,5,9,1, 1,2,3,4,5, 5,7,6,3,4,1,2, 3,2,1,4]} df= pd.DataFrame(df) df group date id value 0 a 2020-01-01 tom 1 1 a 2020-01-01 taliha 2 2 a 2020-01-01 alyssa 3 3 a 2020-01-01 randyl 4 4 b 2019-03-12 tom 7 5 b 2019-03-12 taliha 6 6 b 2019-03-12 edward 4 7 b 2019-03-12 aaron 8 8 b 2019-03-12 daniel 2 9 b 2019-03-12 jean 3 10 b 2019-03-12 sigmund 5 11 b 2019-03-12 albus 9 12 b 2019-03-12 riddle 1 13 c 2020-01-01 fellicia1 14 c 2020-01-01 ron 2 15 c 2020-01-01 fred 3 16 c 2020-01-01 george 4 17 c 2020-01-01 alex 5 18 d 2019-01-17 taliha 5 19 d 2019-01-17 alyssa 7 20 d 2019-01-17 locke 6 21 d 2019-01-17 jon 3 22 d 2019-01-17 jamie 4 23 d 2019-01-17 sam 1 24 d 2019-01-17 sydney 2 25 e 2018-12-03 jon 3 26 e 2018-12-03 jamie 2 27 e 2018-12-03 sam 1 28 e 2018-12-03 arya 4

group date id value together rel 0 a 2020-01-01 tom 1 1 -1 1 a 2020-01-01 taliha 2 1 1 2 a 2020-01-01 alyssa 3 1 -1 3 a 2020-01-01 randyl 4 0 0 4 b 2019-03-12 tom 7 0 0 5 b 2019-03-12 taliha 6 0 0 6 b 2019-03-12 edward 4 0 0 7 b 2019-03-12 aaron 8 0 0 8 b 2019-03-12 daniel 2 0 0 9 b 2019-03-12 jean 3 0 0 10 b 2019-03-12 sigmund 5 0 0 11 b 2019-03-12 albus 9 0 0 12 b 2019-03-12 riddle 1 0 0 13 c 2020-01-01 fellicia1 0 0 14 c 2020-01-01 ron 2 0 0 15 c 2020-01-01 fred 3 0 0 16 c 2020-01-01 george 4 0 0 17 c 2020-01-01 alex 5 0 0 18 d 2019-01-17 taliha 5 0 0 19 d 2019-01-17 alyssa 7 0 0 20 d 2019-01-17 locke 6 0 0 21 d 2019-01-17 jon 3 1 -2 22 d 2019-01-17 jamie 4 1 0 23 d 2019-01-17 sam 1 1 2 24 d 2019-01-17 sydney 2 0 0 25 e 2018-12-03 jon 3 0 0 26 e 2018-12-03 jamie 2 0 0 27 e 2018-12-03 sam 1 0 0 28 e 2018-12-03 arya 4 0 0

1条回答

网友

1楼 · 发布于 2024-09-30 18:13:27

我试试看。第一项任务似乎很容易，第二项任务让我头疼。我第二部分的结果与你的预期略有不同。也许你犯了一个错误，但很可能是因为我的误解

from itertools import combinations

df_grps = df.groupby([df.date.dt.year, 'group']).id.apply(set)
df_vals = df.set_index([df.date.dt.year, 'group', 'id']).value
results = {}
for year in sorted(df.date.dt.year.unique())[1:]:
    groups = {}
    for group in df_grps.loc[year].index:
        ids = df_grps.loc[year, group]
        together = set().union(*(
                       i for i in (ids & h for h in df_grps.loc[year-1]) if len(i) > 1
                   ))
        if not together:
            continue
        together = {i: 0 for i in together}
        for i, j in combinations(together, 2):
            for group_old in df_grps.loc[year-1].index:
                if not {i, j} <= df_grps.at[year-1, group_old]:
                    continue
                i_val = df_vals.at[year-1, group_old, i]
                j_val = df_vals.at[year-1, group_old, j]
                if i_val < j_val:
                    together[i] += 1
                    together[j] -= 1
                elif i_val > j_val:
                    together[i] -= 1
                    together[j] += 1       
        groups[group] = together
    if groups:
        results[year] = groups

df_res = pd.DataFrame(
             [
                 [year, group, i, r]
                 for year, groups in results.items()
                 for group, rel in groups.items()
                 for i, r in rel.items()
             ],
             columns=['date', 'group', 'id', 'rel']
         ).set_index(['date', 'group', 'id'])

df.set_index([df.date.dt.year, 'group', 'id'], inplace=True)
df['together'], df['rel'] = 0, 0
df.loc[df_res.index, 'together'] = 1
df.loc[df_res.index, 'rel'] = df_res.rel

示例帧的结果：

                          date  value  together  rel
date group id                                       
2020 a     tom      2020-01-01      1         1   -1
           taliha   2020-01-01      2         1    2
           alyssa   2020-01-01      3         1   -1
           randyl   2020-01-01      4         0    0
2019 b     tom      2019-03-12      7         0    0
           taliha   2019-03-12      6         0    0
           edward   2019-03-12      4         0    0
           aaron    2019-03-12      8         0    0
           daniel   2019-03-12      2         0    0
           jean     2019-03-12      3         0    0
           sigmund  2019-03-12      5         0    0
           albus    2019-03-12      9         0    0
           riddle   2019-03-12      1         0    0
2020 c     fellicia 2020-01-01      1         0    0
           ron      2020-01-01      2         0    0
           fred     2020-01-01      3         0    0
           george   2020-01-01      4         0    0
           alex     2020-01-01      5         0    0
2019 d     taliha   2019-01-17      5         0    0
           alyssa   2019-01-17      7         0    0
           locke    2019-01-17      6         0    0
           jon      2019-01-17      3         1   -2
           jamie    2019-01-17      4         1    0
           sam      2019-01-17      1         1    2
           sydney   2019-01-17      2         0    0
2018 e     jon      2018-12-03      3         0    0
           jamie    2018-12-03      2         0    0
           sam      2018-12-03      1         0    0
           arya     2018-12-03      4         0    0

PS：我也有一个版本，在熊猫框架内停留了一点，但它更长。如果你感兴趣的话，我会把它贴出来

相关问题更多 >

编程相关推荐

热门问题

热门文章