如何在Pandas图书馆按天求和?

2024-09-28 03:21:27 发布

您现在位置:Python中文网/ 问答频道 /正文

我创建了以下词典:

for k, er in dicio.items():
    #dicio[k]['Return %'] = er.iloc[:, 0].pct_change(-1)*100
    dicio[k]['Day'] = er.index.day
dicio

 {'WDOFUT':             WDOFUT  Day
 Data                   
 2020-09-11  5325.0   11
 2020-09-10  5325.0   10
 2020-09-09  5312.5    9
 2020-09-08  5366.0    8
 2020-09-04  5303.0    4
 ...            ...  ...
 1994-07-08     NaN    8
 1994-07-07     NaN    7
 1994-07-06     NaN    6
 1994-07-05     NaN    5
 1994-07-04     NaN    4
 
 [6482 rows x 2 columns],
 'WEGE3':             WEGE3  Day
 Data                  
 2020-09-11  62.42   11
 2020-09-10  62.42   10
 2020-09-09  64.93    9
 2020-09-08  63.00    8
 2020-09-04  64.49    4
 ...           ...  ...
 1994-07-08    NaN    8
 1994-07-07    NaN    7
 1994-07-06    NaN    6
 1994-07-05    NaN    5
 1994-07-04    NaN    4
 
 [6482 rows x 2 columns],
 'YDUQ3':             YDUQ3  Day
 Data                  
 2020-09-11  27.31   11
 2020-09-10  27.31   10
 2020-09-09  27.99    9
 2020-09-08  28.75    8
 2020-09-04  27.78    4
 ...           ...  ...
 1994-07-08    NaN    8
 1994-07-07    NaN    7
 1994-07-06    NaN    6
 1994-07-05    NaN    5
 1994-07-04    NaN    4
 
 [6482 rows x 2 columns]}

我可以按天分组,但它只取字典的最后一项(YDUQ3):

grouped_by_day = dicio[k].groupby('Day')
grouped_by_day.describe()

YDUQ3
count   mean    std min 25% 50% 75% max
Day                             
1   86.0    13.974651   9.391865    2.96    5.4450  11.770  21.2000 39.75
2   95.0    15.022842   10.624683   2.57    5.6900  13.290  21.4050 49.19
3   102.0   15.262549   11.061839   2.44    5.8950  12.800  21.8575 53.85
              ................................................
29  96.0    14.498229   10.321219   2.61    5.4150  12.975  21.0425 50.88
30  92.0    14.914674   10.701043   2.61    5.5125  13.120  21.7150 51.32
31  51.0    15.339608   10.676544   2.96    6.1350  13.420  21.7150 51.73

我可以看到下面显示的每日分组词典,但仅限于最后一项(我需要全部):

list(grouped_by_day)

[(1,
              YDUQ3  Day
  Data                  
  2020-09-01  27.89    1
  2020-07-01  34.41    1
  2020-06-01  29.82    1
  2020-04-01  21.30    1
  2019-11-01  39.75    1
  ...           ...  ...
  1995-02-01    NaN    1
  1994-12-01    NaN    1
  1994-11-01    NaN    1
  1994-09-01    NaN    1
  1994-08-01    NaN    1      
  [182 rows x 2 columns]),
   ......................
   ......................
  (31,
              YDUQ3  Day
  Data                  
  2020-08-31  26.95   31
  2020-07-31  33.89   31
  2020-03-31  21.76   31
  2020-01-31  51.73   31
  2019-10-31  38.52   31
  ...         ...    ...
  1995-05-31    NaN   31
  1995-03-31    NaN   31
  1995-01-31    NaN   31
  1994-10-31    NaN   31
  1994-08-31    NaN   31
  
  [113 rows x 2 columns])]

问题:

  • 如何显示词典中的3项? (dicio[k]只使用了一个键(最后一个键))

  • 我想把所有同一天的收益加起来

    • 如果跨度为10年,则将有~120天01、~120天02,依此类推

    • 每个符号都有一个31 x 120的字典,我们可以从中选择累积收益的最高日和最低日

    • 然后,我想展示整个股票投资组合的最高/最低回报及其发生的天数


Tags: columnsdataby字典nan词典rowser
1条回答
网友
1楼 · 发布于 2024-09-28 03:21:27

从你问题的细节来看,我不确定,但从你问题的框架来看,似乎每个股票都有一个单独的数据框架。如果是这种情况,您可以尝试将它们全部合并到一个数据帧中。我用这个例子来说明我的意思

  import pandas as pd
  import numpy as np
  dicio =  {
      'WDOFUT': [              
   [pd.Timestamp(year=2020, month= 9, day= 11),  5325.0, 11],
   [pd.Timestamp(year=2020, month= 9, day= 10),  5325.0, 10],
   [pd.Timestamp(year=2020, month= 9, day= 9),  5312.5, 9],
   [pd.Timestamp(year=2020, month= 9, day= 8),  5366.0, 8],
   [pd.Timestamp(year=2020, month= 9, day= 4),  5303.0, 4],
   [pd.Timestamp(year=1994, month= 7, day= 8),  np.nan,  8],
   [pd.Timestamp(year=1994, month= 7, day= 7),  np.nan, 7],
   [pd.Timestamp(year=1994, month= 7, day= 6),  np.nan, 6],
   [pd.Timestamp(year=1994, month= 7, day= 5),  np.nan,  5],
   [pd.Timestamp(year=1994, month= 7, day= 4),  np.nan, 4],],
      'WEGE3': [
   [pd.Timestamp(year=2020, month=9, day= 11),  62.42, 11],
   [pd.Timestamp(year=2020, month=9, day= 10),  62.42, 10],
   [pd.Timestamp(year=2020, month=9, day= 9),  64.93,  9],
   [pd.Timestamp(year=2020, month=9, day= 8), 63.00,  8],
   [pd.Timestamp(year=2020, month=9, day= 4),  64.49,  4],
   [pd.Timestamp(year=1994, month=7, day= 8), np.nan,  8],
   [pd.Timestamp(year=1994, month=7, day= 7), np.nan,  7],
   [pd.Timestamp(year=1994, month=7, day= 6), np.nan, 6],
   [pd.Timestamp(year=1994, month=7, day=5), np.nan,  5],
   [pd.Timestamp(year=1994, month=7, day=4), np.nan,  4]
   ],
      'YDUQ3':[                  
   [pd.Timestamp(year=2020, month=9, day= 11),  27.31,   11],
   [pd.Timestamp(year=2020, month=9, day= 10),  27.31,    10],
   [pd.Timestamp(year=2020, month=9, day= 9),  27.99,    9],
   [pd.Timestamp(year=2020, month=9, day= 8),  28.75,    8],
   [pd.Timestamp(year=2020, month=9, day= 4),  27.78,   4],
   [pd.Timestamp(year=1994, month=7, day= 8), np.nan,   8],
   [pd.Timestamp(year=1994, month=7, day= 7), np.nan,  7],
   [pd.Timestamp(year=1994, month=7, day= 6), np.nan,   6],
   [pd.Timestamp(year=1994, month=7, day= 5), np.nan,  5],
   [pd.Timestamp(year=1994, month=7, day= 4), np.nan,  4]],
   }
   data_list = []
   for stk in dicio.keys():
      for itm in dicio[stk]:
          dline =[stk]
          dline.extend(itm)
          data_list.append(dline)  
   df = pd.DataFrame(data= data_list, columns= ['Stock','Date', 'Return','Day'])
   grouped_by_day = df.groupby(by=['Day','Stock']).mean()
    

按天分组的打印输出产生:

             
Day Stock   Return
4   WDOFUT  5303.00
    WEGE3   64.49
    YDUQ3   27.78
5   WDOFUT  NaN
    WEGE3   NaN
    YDUQ3   NaN
6   WDOFUT  NaN
    WEGE3   NaN
    YDUQ3   NaN
7   WDOFUT  NaN
    WEGE3   NaN
    YDUQ3   NaN
8   WDOFUT  5366.00
    WEGE3   63.00
    YDUQ3   28.75
9   WDOFUT  5312.50
    WEGE3   64.93
   YDUQ3    27.99
10  WDOFUT  5325.00
    WEGE3   62.42
    YDUQ3   27.31
11  WDOFUT  5325.00
    WEGE3   62.42
    YDUQ3   27.31

我认为你应该能够从这个小组中得到你想要的结果

相关问题 更多 >

    热门问题