在Python中优化熔化

2024-10-03 06:25:52 发布

您现在位置:Python中文网/ 问答频道 /正文

输入df:

  ID    Date   Start    End
  ABC   2012   15       17
  ABC   2013   11       20
  XYZ   2012   30       50
  XYZ   2014   50       100

预期输出DF:

  ID    Date   Variable Value
  ABC   2012   Start    15
  ABC   2012   End      17
  ABC   2013   Start    11
  ABC   2013   End      20
  XYZ   2012   Start    30
  XYZ   2012   End      50
  XYZ   2014   Start    50
  XYZ   2014   End      100

代码:

columns=['ID','Date','Variable','Value']
outputdf=pd.DataFrame(index=range(len(df)*2),columns=columns)
tempPos=0
for pos,id in enumerate(df['ID']):
   outputdf.loc[[tempPos,tempPos+1],'ID']=df.loc[pos,'ID']
   outputdf.loc[[tempPos,tempPos+1],'Date']=df.loc[pos,'Date']
   outputdf.loc[tempPos,'Variable']="Start"
   outputdf.loc[tempPos+1,'Variable']="End"
   outputdf.loc[tempPos,'Value']=df.loc[pos,'Start']
   outputdf.loc[tempPos+1,'Value']=df.loc[pos,'End']
   tempPos=tempPos+2

我有超过10万行,这需要很长时间。我如何优化它?感谢您的帮助。谢谢


Tags: columnsposiddfdatevaluevariablestart
1条回答
网友
1楼 · 发布于 2024-10-03 06:25:52

使用^{}

(df.melt(id_vars=['ID','Date'],
              var_name='Variable',
              value_name='Value').sort_values(['ID','Date']))

(df.set_index(['ID','Date'])
   .rename_axis(columns='Variable')
   .stack()
   .reset_index(name='Value'))

输出

    ID  Date Variable  Value
0  ABC  2012    Start     15
4  ABC  2012      End     17
1  ABC  2013    Start     11
5  ABC  2013      End     20
2  XYZ  2012    Start     30
6  XYZ  2012      End     50
3  XYZ  2014    Start     50
7  XYZ  2014      End    100

%%timeit
(df.set_index(['ID','Date'])
   .rename_axis(columns='Variable')
   .stack()
   .reset_index(name='Value'))
5.3 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
(df.melt(id_vars=['ID','Date'],
          var_name='Variable',
          value_name='Value').sort_values(['ID','Date']))
5.77 ms ± 329 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

%%timeit
columns=['ID','Date','Variable','Value']
outputdf=pd.DataFrame(index=range(len(df)*2),columns=columns)
tempPos=0
for pos,id in enumerate(df['ID']):
   outputdf.loc[[tempPos,tempPos+1],'ID']=df.loc[pos,'ID']
   outputdf.loc[[tempPos,tempPos+1],'Date']=df.loc[pos,'Date']
   outputdf.loc[tempPos,'Variable']="Start"
   outputdf.loc[tempPos+1,'Variable']="End"
   outputdf.loc[tempPos,'Value']=df.loc[pos,'Start']
   outputdf.loc[tempPos+1,'Value']=df.loc[pos,'End']
   tempPos=tempPos+2
9.72 ms ± 1.39 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

相关问题 更多 >