在python pandas datafram中添加时间序列强度的廉价方法

import pandas as ps import math import numpy as np person1=[3,0,10,10,10,10,10] person2=[4,0,20,20,25,25,40] person3=[5,0,5,5,15,15,40] allPeopleDf=ps.DataFrame(np.array(zip(person1,person2,person3)).T) allPeopleDf.columns=['count','start1', 'end1', 'start2', 'end2', 'start3','end3'] allPeopleDfNoCount=allPeopleDf[['start1', 'end1', 'start2', 'end2', 'start3','end3']] uniqueTimes=sorted(ps.unique(allPeopleDfNoCount.values.ravel())) possibleStates=[-1,0,1,2] #extra state 0 for initialization stateData={} comboStates={} #initialize dict to add up all of the stateData for time in uniqueTimes: comboStates[time]=0.0 allPeopleDf['track']=-1 allPeopleDf['status']=-1 numberState=len(possibleStates) starti=-1 endi=0 startState=0 for i in range(3): starti=starti+2 print starti endi=endi+2 for time in uniqueTimes: def helper(row): start=row[starti] end=row[endi] track=row[7] if start <= time and time < end: return possibleStates[i+1] else: return possibleStates[0] def trackHelp(row): status=row[8] track=row[7] if track<=status: return status else: return track def Multiplier(row): x=row[8] if x==0: return 0.0*row[0] if x==1: return 5.0*row[0] if x==2: return 10.0*row[0] if x==-1:#numeric place holder for non-contributing return 0.0*row[0] allPeopleDf['status']=allPeopleDf.apply(helper,axis=1) allPeopleDf['track']=allPeopleDf.apply(trackHelp,axis=1) stateData[time]=allPeopleDf.apply(Multiplier,axis=1).sum() for k,v in stateData.iteritems(): comboStates[k]=comboStates.get(k,0)+v print allPeopleDf print stateData print comboStates

2条回答

网友

1楼 · 编辑于 2024-09-30 04:40:36

似乎是.sum()的用途：

In [10]:

allPeopleDf.sum()
Out[10]:
aStart     0
aEnd      35
bStart    35
bEnd      50
cStart    50
cEnd      90
dtype: int32

网友

2楼 · 编辑于 2024-09-30 04:40:36

以钢琴键为例，假设你有三个键，有30个强度级别。在

我会尽量以这种格式保存数据：

import pandas as pd
df = pd.DataFrame([[10,'A',5],
                   [10,'B',7],
                   [13,'C',10],
                   [15,'A',15],
                   [20,'A',7],
                   [23,'C',0]], columns=["time", "key", "intensity"])

   time   key  intensity
0    10     A          5
1    10     B          7
2    13     C         10
3    15     A         15
4    20     A          7
5    23     C          0

在那里你可以记录任何一个键的强度变化。从这里您已经可以得到每个键的笛卡尔坐标，作为(time,intensity)对

^{pr2}$

然后，您可以轻松地创建一个新列increment，它将指示该关键点在该时间点发生的强度变化（intensity仅表示强度的新值）

df["increment"]=df.groupby("key")["intensity"].transform(
                             lambda x: x.sub(x.shift(), fill_value= 0 ))
df

   time key  intensity  increment
0    10   A          5          5
1    10   B          7          7
2    13   C         10         10
3    15   A         15         10
4    20   A          7         -8
5    23   C          0        -10

然后，使用这个新列，可以生成(time, total_intensity)对作为笛卡尔坐标

df.groupby("time").sum()["increment"].cumsum()

time
10      12
13      22
15      32
20      24
23      14
dtype: int64

编辑：应用所讨论的特定数据

假设数据是一个值列表，从元素id（person/piano key）开始，然后是一个因子乘以该元素的测量权重/强度，然后是一对时间值，指示一系列已知状态的开始和结束（负重/发射强度）。不确定数据格式是否正确。从你的问题来看：

data1=['person1',3,0.0,10.0,10.0,10.0,10.0,10.0]
data2=['person2',4,0,20,20,25,25,40]
data3=['person3',5,0,5,5,15,15,40]

如果我们知道每个状态的重量/强度，我们可以定义：

known_states = [5, 10, 15]
DF_columns = ["time", "id", "intensity"]

然后，我想到的加载数据的最简单方法包括以下函数：

import pandas as pd

def read_data(data, states, columns):
    id = data[0]
    factor = data[1]
    reshaped_data = []
    for i in xrange(len(states)):
        j += 2+2*i
        if not data[j] == data[j+1]:
            reshaped_data.append([data[j], id, factor*states[i]])
            reshaped_data.append([data[j+1], id, -1*factor*states[i]])
    return pd.DataFrame(reshaped_data, columns=columns)

请注意，if not data[j] == data[j+1]:避免在给定状态的start和end的时间相等时将数据加载到dataframe中（看起来不具信息性，而且无论如何不会出现在图中）。但如果你还想要这些条目，就把它拿出来。在

然后，加载数据：

df = read_data(data1, known_states, DF_columns)
df = df.append(read_data(data2, known_states, DF_columns), ignore_index=True)
df = df.append(read_data(data3, known_states, DF_columns), ignore_index=True)
# and so on...

然后你就在这个答案的开头了（当然，用id和id代替key）

编辑：应用所讨论的特定数据

相关问题更多 >

编程相关推荐

热门问题

热门文章