根据Python中事件的时间创建概率表

df = d = pd.DataFrame({ 'duplicates': [ [('007', "us1", "us2", "time1", 'time2', 4)], [('008', "us1", "us2", "time1", 'time2', 5)], [('009', "us1", "us2", "time1", 'time2', 6)], [('007', 'us2', "us3", "time1", 'time2', 4)], [('008', 'us2', "us3", "time1", 'time2', 7)], [('009', 'us2', "us3", "time1", 'time2', 11)], [('001', 'us5', 'us1', "time1", 'time2', 0)], [('008', 'us5', 'us1', "time1", 'time2', 19)], [('007',"us3", "us2", "time1", 'time2', 2)], [('007',"us3", "us2", "time1", 'time2', 34)], [('009',"us3", "us2", "time1", 'time2', 67)]], 'numberOfInteractions': [1, 2, 3, 4, 5, 6, 7, 8, 1, 1, 11] })

df['duplicates'] = df.apply( lambda x: [(x['numberOfInteractions'],a, b, c, d, e,f) for a, b, c, d, e, f in x.duplicates], 1) df =(pd.DataFrame(df["duplicates"].explode().tolist(), columns=["numberOfInteractions", "ID","USER1","USER2","TAU1","TAU2","DELAY"]) .groupby(["USER1","USER2"])["numberOfInteractions"] .agg(sum).to_frame().unstack()) df.columns = df.columns.get_level_values(1) combined = df.index|df.columns for col in combined: if col not in df.columns: df[col] = np.nan df[col] = df[col] / df[col].sum(skipna=True)

1条回答

网友

1楼 · 发布于 2024-09-02 21:29:29

我想你有两条路要走

你可以根据延迟和交互次数（我会做的）选择一个新专栏：

def mapToNbOfInteractionsPerDelay(group):
    nbOfInteractions = group['numberOfInteractions']
    delay = group['DELAY']

    if(delay <= 5):
        return (nbOfInteractions, 0, 0, 0, 0)
    elif(delay <= 19):
        return (0, nbOfInteractions, 0, 0, 0)
    elif(delay <= 60):
        return (0, 0, nbOfInteractions, 0, 0)
    elif(delay <= 80):
        return (0, 0, 0, nbOfInteractions, 0)
    else:
        return (0, 0, 0, 0, nbOfInteractions)


df["nbOfInteractionsPerDelay"] = df[["DELAY", "numberOfInteractions"]].apply(mapToNbOfInteractionsPerDelay, axis=1)

然后你可以选择：

df = (df.groupby(["USER1","USER2"])["nbOfInteractionsPerDelay"]
        .agg(lambda l : tuple([sum(x) for x in zip(*l)])).to_frame().unstack())

这将为您提供以下信息：

      nbOfInteractionsPerDelay                                    
USER2                      us1               us2               us3
USER1                                                            
us1                        NaN   (3, 3, 0, 0, 0)               NaN
us2                        NaN               NaN  (4, 11, 0, 0, 0)
us3                        NaN  (1, 0, 1, 11, 0)               NaN
us5            (7, 8, 0, 0, 0)               NaN               NaN

从那里，你可以很容易地得到你想要的

或者将数据帧拆分为5个其他数据帧，每个数据帧具有特定延迟子集的值，然后合并

相关问题更多 >

编程相关推荐

热门问题

热门文章