如何通过访问分组数据的特定列并将其转换回dataframe来对分组数据执行计算

2024-10-02 02:30:05 发布

您现在位置:Python中文网/ 问答频道 /正文

我正在尝试对车辆数据执行一些数据分析,需要根据车辆ID对数据进行分组,然后根据每个ID,使用每辆车的第一行来找到沿路径的距离,并用另一列减去它,然后进行数据的累积和

基本上,单个车辆id的步骤为:

代码

df_signal_group = df_broadway[
    df_broadway.trajectory_signal_group == '4']
df_signal_group_1 = df_signal_group[
    df_signal_group.temporaryId == 26]
df_signal_group_1['distance_along_path_change'] = (
    df_signal_group_1['distance_along_path'] - 172.78
) # this is the first row for each vehicle's distance_to_stopbar
df_signal_group_1['groupbydistance'] = (
    df_signal_group_1
    .distance_along_path_change
    .eq(-172.78).cumsum()
)

我有多辆这样的车辆,我阅读了对所有车辆id重复这些步骤的内容

df_signal_group = df_broadway[
    df_broadway.trajectory_signal_group == '4']
df_grouped = df_signal_group.groupby('temporaryId')

我也被困在这一步,我该如何进一步。。。 我知道我可以使用df_signal_group.groupby('temporaryId').first()获得每个组的第一行值…但是如何使用这些数据来迭代每个组的组。。。 任何指示都会有帮助

样本日期如下

enter image description here

这只是一个简单的数据,车辆ID混淆,因此需要分组

temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status
26  4   172.78  0   True
26  4   170.33  2.459140924298365   True
26  4   167.88  4.883816339797585   True
26  4   165.49  7.274043647721051   True
26  4   164.31  8.456244827531695   True
26  4   161.96  10.794833648650943  True
26  4   159.66  13.099019997543072  True
26  4   158.51  14.238218211441483  True
125 4   173.54  0   True
125 4   172.4   1.179344296415053   True
125 4   170.01  3.5609045873593734  True
125 4   167.61  5.95965979143056    True
125 4   165.2   8.362024854827855   True
125 4   162.79  10.76439000598294   True
125 4   160.38  13.166755196815991  True
125 4   157.98  15.56912041000858   True
125 4   156.77  16.77030301927281   True
125 4   155.57  17.971485632809344  True
125 4   154.36  19.172668245991783  True
125 4   151.96  21.57503347794954   True
125 4   150.76  22.776216095592986  True
125 4   148.34  25.17858133262119   True
125 4   147.14  26.37976395246835   True
125 4   144.73  28.783361992012317  True
125 4   143.52  29.989240517622683  True
125 4   141.09  32.41716300616539   True

多谢各位

预期产出-

temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status    distance_along_path_change  groupbydistance
260 4   172.6   0   True    -172.6  0
260 4   171.65  0.9526235800176956  True    -171.6473764199823  0
260 4   169.7   2.8877960903921576  True    -169.71220390960784 0
260 4   167.73  4.862869066444613   True    -167.7371309335554  0
260 4   166.72  5.865368230445712   True    -166.73463176955428 0
260 4   164.68  7.899986028468888   True    -164.70001397153112 0
260 4   163.65  8.930637572963427   True    -163.66936242703656 0
260 4   162.61  9.968169381978832   True    -162.63183061802116 0
260 4   161.56  11.011111474828203  True    -161.5888885251718  0
260 4   159.46  13.108045032255115  True    -159.49195496774487 0
26  4   172.78  0   True    -172.78 1
26  4   170.33  2.459140924298365   True    -170.32085907570163 1
26  4   167.88  4.883816339797585   True    -167.8961836602024  1
26  4   165.49  7.274043647721051   True    -165.50595635227896 1
26  4   164.31  8.456244827531695   True    -164.3237551724683  1
26  4   161.96  10.794833648650945  True    -161.98516635134905 1
26  4   159.66  13.099019997543072  True    -159.68098000245692 1
26  4   158.51  14.238218211441483  True    -158.54178178855852 1
26  4   156.26  16.490836950069347  True    -156.28916304993066 1
26  4   154.03  18.70910216437552   True    -154.0708978356245  1
26  4   151.84  20.893034435436896  True    -151.8869655645631  1
26  4   150.76  21.972132321013312  True    -150.8078676789867  1


Tags: to数据pathidtruedfsignalgroup
1条回答
网友
1楼 · 发布于 2024-10-02 02:30:05

该过程的流程是查找按id和group列分组的第一行。接下来,采用累积和来确定顺序。我们将“距离\u到\u止动杆”乘以-1进行计算。将新数据帧与原始数据帧连接起来。将生成的NA向前填充。最后,我们计算“距离-路径-变化”

df_groups = df_broadway.groupby(['temporaryId','trajectory_signal_group']).first().reset_index()
df_groups['groupbydistance'] = df_groups['onmap_status'].cumsum()
df_groups['distance_along_path'] = df_groups['distance_to_stopbar'] * -1
df_broadway = df_broadway.merge(df_groups, on=['temporaryId','trajectory_signal_group','distance_to_stopbar'], how='outer')
df_broadway.columns = ['temporaryId', 'trajectory_signal_group', 'distance_to_stopbar', 'distance_along_path', 'onmap_status', 'tmp', 'distance_along_path_change', 'groupbydistance']
df_broadway.fillna(method='ffill', inplace=True)
df_broadway['distance_along_path_change'] = df_broadway['distance_to_stopbar'] + df_broadway['tmp']
df_broadway.drop('tmp', axis=1, inplace=True)

df_broadway.head(10)
    temporaryId trajectory_signal_group distance_to_stopbar distance_along_path onmap_status    distance_along_path_change  groupbydistance
0   26  4   172.78  0.000000    True    0.00    1.0
1   26  4   170.33  2.459141    True    -2.45   1.0
2   26  4   167.88  4.883816    True    -4.90   1.0
3   26  4   165.49  7.274044    True    -7.29   1.0
4   26  4   164.31  8.456245    True    -8.47   1.0
5   26  4   161.96  10.794834   True    -10.82  1.0
6   26  4   159.66  13.099020   True    -13.12  1.0
7   26  4   158.51  14.238218   True    -14.27  1.0
8   125 4   173.54  0.000000    True    0.00    2.0
9   125 4   172.40  1.179344    True    -1.14   2.0

相关问题 更多 >

    热门问题