Python使用附加的字符串列在dataframe中汇总往返数据

station from station to lgot count 0 20001 20040 stud 22 1 20001 20040 fed 33 0 20040 20001 stud 44 2 20040 20001 reg 55 3 20002 20015 stud 66 3 20015 20002 stud 77 station from station to lgot count count_back 0 20001 20040 stud 22 44 1 20001 20040 fed 33 0 2 20040 20001 reg 55 0 3 20002 20015 stud 66 77

df.head() station from station to lgot count 0 2030080 2030000 full 464 1 2030000 2030080 full 395 2 2030150 2030000 full 330 3 2030000 2030150 full 285 4 2030240 2030000 full 249 df.loc[df['lgot'] == 'full', 'lgot'] = '11' df.loc[df['lgot'] == 'rzd', 'lgot'] = '22' df.loc[df['lgot'] == 'fed', 'lgot'] = '33' df.loc[df['lgot'] == 'reg', 'lgot'] = '44' df.loc[df['lgot'] == 'stud', 'lgot'] = '55' df.loc[df['lgot'] == 'voen', 'lgot'] = '66' df['station to'] = df['station to'].astype('string')+df['lgot'] df['station from'] = df['station from'].astype('string')+df['lgot'] df['station to'] = df['station to'].astype('int') df['station from'] = df['station from'].astype('int') df.drop(['lgot'], axis='columns', inplace=True) def roundtrip(df): a, b, c, d = 'station from', 'station to', 'count', 'count_back' idx = df[a] > df[b] df = df.assign(**{d: 0}) df.loc[idx, [a, b, c, d]] = df.loc[idx, [b, a, d, c]].values return df.groupby([a, b]).sum() df = roundtrip(df) df= df.reset_index() df['lgot'] = df["station from"].astype('string').str.slice(start=-2) df['station from'] = df['station from'].astype('string').str.slice(stop=7) df['station to'] = df['station to'].astype('string').str.slice(stop=7) df.head() station from station to count count_back lgot 0 1003704 2030133 0 1 11 1 1003704 2030160 0 1 11 2 1003704 2031321 0 1 11 3 1003704 2030132 0 1 22 4 1003704 2030133 0 1 22

1条回答

网友

1楼 · 发布于 2024-09-30 16:22:19

皮埃尔的解决方案不再适用于另一个问题；因为，使用新数据df[a] > df[b]失败，因为第五行现在小于第四行。因此，对新数据执行此操作的最佳方法是使用.shift()。此外，您还可以将sort=False传递给您的groupby以提高性能和维护秩序。最后，我使用了.reset_index()，并根据新数据修改了a,b,c,d,e的列变量

def roundtrip(df):
    a, b, c, d, e = 'station from', 'station to', 'lgot', 'count', 'count_back'
    idx = (df[a] == df[b].shift()) & (df[b] == df[a].shift())
    df = df.assign(**{e: 0})
    df.loc[idx, [a, b, c, d, e]] = df.loc[idx, [b, a, c, e, d]].values
    return df.groupby([a, b, c], sort=False).sum().reset_index()


roundtrip(df)
Out[1]: 
   station from  station to  lgot  count  count_back
0         20001       20040  stud     22          44
1         20001       20040   fed     33           0
2         20040       20001   reg     55           0
3         20002       20015  stud     66          77

相关问题更多 >

编程相关推荐

热门问题

热门文章