Pandas DataFrame无法正确排序值

2024-05-17 08:10:03 发布

您现在位置:Python中文网/ 问答频道 /正文

我有下面的代码,用来对名为“coint_results”的数据帧进行排序

results = np.array([combinations[pair][0], combinations[pair][1], a, b, egtest[0]])
coint_results = coint_results.append(pd.DataFrame(results).T)
coint_results = coint_results.reset_index().dropna().drop("index", axis=1)
coint_results.columns = ['stock_1', 'stock_2', 'a', 'b', 't-stat']
coint_results.sort_values(by=['t-stat'], ascending=True, inplace=True, ignore_index=True)
print(coint_results)

我认为它应该很有效,但结果是这样的

  stock_1 stock_2  ...                      b               t-stat
0    EXO.MI  TIT.MI  ...    0.03767853882823848   -10.03626667966398
1   BAMI.MI  TIT.MI  ...   0.002448683245146175  -10.054806761275424
2    STM.MI  TIT.MI  ...  -0.002184095752726706  -10.068964759967473
3    SRG.MI  TIT.MI  ...     0.4478998680510796  -10.115909021808724
4    SPM.MI  TIT.MI  ...    0.06460752028453558  -10.354583264537016
5   ENEL.MI  TIT.MI  ...    0.11502811550372381  -11.004160730546342
6    SRG.MI  UNI.MI  ...     11.243141736537956   -3.906435255815229
7     IP.MI  STM.MI  ...      -4.73690752582681  -3.9585359373191573
8   STLA.MI  STM.MI  ...      5.160916664191976   -4.121749682409465
9   BAMI.MI  STM.MI  ...     0.5317077958371442   -4.311274260473334
10   TIT.MI  UCG.MI  ...      6.132134120131706   -4.475906614947818
11   ISP.MI  STM.MI  ...     0.6185417904201413   -4.506308325127962
12   BZU.MI  UCG.MI  ...    -4.5712728593642025  -5.0446571567547585
13     G.MI  UCG.MI  ...     5.2911521693982175  -5.1304167685585265
14   SRG.MI  UCG.MI  ...    -17.988827453255993  -5.1784094787560635
15  BMED.MI  UCG.MI  ...      9.924937184752496   -5.406007568494866
16    IP.MI  UCG.MI  ...    -10.742575715380323    -5.54071721843284
17   STM.MI  UCG.MI  ...     2.3788486425865387   -5.620706678454393
18   ISP.MI  UCG.MI  ...      8.092421011705971   -6.016276738559553
19  STLA.MI  UCG.MI  ...       21.6916172567256   -6.176169724335034
20  BAMI.MI  UCG.MI  ...     1.5867053473179111   -6.749604800157616
21     G.MI  TIT.MI  ...     0.2427544870802295   -9.048408747110315
22    IP.MI  TIT.MI  ...    0.03859781538585183    -9.87594779908127
23   ISP.MI  TIT.MI  ...   0.017164281671812998   -9.887616220510456
24  BMED.MI  TIT.MI  ...   -0.04082879951061795     -9.8910162081357
25  STLA.MI  TIT.MI  ...    0.09583782780784517   -9.910261629446035
26    MB.MI  TIT.MI  ...     0.1551557504463611    -9.91316985348034
27   BZU.MI  TIT.MI  ...    0.03270215832874518   -9.944786934999167

显然,Panda按降序对值进行排序,但前6个值应位于数据帧的底部。这对我来说毫无意义

有人能帮我弄清楚吗


Tags: iptrueindexstockresultsstatmistm
1条回答
网友
1楼 · 发布于 2024-05-17 08:10:03

这就是词典排序。出于某种原因,t-stat属于object类型

排序前转换^{}

coint_results['t-stat'] = pd.to_numeric(coint_results['t-stat'])
coint_results.sort_values(by=['t-stat'], ascending=True,
                          inplace=True, ignore_index=True)

默认显示行为将隐藏一些值can^{}显示精度以查看更多信息:

pd.set_option('display.precision', 15)
print(coint_results)

coint_results

                t-stat
0  -11.004160730546342
1  -10.354583264537016
2  -10.115909021808724
3  -10.068964759967473
4  -10.054806761275424
5  -10.036266679663980
6   -9.944786934999167
7   -9.913169853480341
8   -9.910261629446035
9   -9.891016208135699
10  -9.887616220510456
11  -9.875947799081271
12  -9.048408747110315
13  -6.749604800157616
14  -6.176169724335034
15  -6.016276738559553
16  -5.620706678454393
17  -5.540717218432840
18  -5.406007568494866
19  -5.178409478756063
20  -5.130416768558526
21  -5.044657156754758
22  -4.506308325127962
23  -4.475906614947818
24  -4.311274260473334
25  -4.121749682409465
26  -3.958535937319157
27  -3.906435255815229

^{}可用于根据排序列选择索引:

coint_results = (
    coint_results.loc[
        pd.to_numeric(coint_results['t-stat'])
            .sort_values(ascending=True).index
    ].reset_index(drop=True)
)

或者,可以创建一个临时列进行排序,然后从数据帧中^{}

coint_results['sortby'] = pd.to_numeric(coint_results['t-stat'])
coint_results.sort_values(by='sortby', ascending=True, inplace=True,
                          ignore_index=True)
coint_results.drop('sortby', axis=1, inplace=True)
print(coint_results)

这两个选项都会创建一个单独的序列进行排序,因此t-stat列中的值不会发生变化

coint_results

                 t-stat
0   -11.004160730546342
1   -10.354583264537016
2   -10.115909021808724
3   -10.068964759967473
4   -10.054806761275424
5    -10.03626667966398
6    -9.944786934999167
7     -9.91316985348034
8    -9.910261629446035
9      -9.8910162081357
10   -9.887616220510456
11    -9.87594779908127
12   -9.048408747110315
13   -6.749604800157616
14   -6.176169724335034
15   -6.016276738559553
16   -5.620706678454393
17    -5.54071721843284
18   -5.406007568494866
19  -5.1784094787560635
20  -5.1304167685585265
21  -5.0446571567547585
22   -4.506308325127962
23   -4.475906614947818
24   -4.311274260473334
25   -4.121749682409465
26  -3.9585359373191573
27   -3.906435255815229

数据帧构造函数:

coint_results = pd.DataFrame({
    't-stat': ['-10.03626667966398', '-10.054806761275424',
               '-10.068964759967473', '-10.115909021808724',
               '-10.354583264537016', '-11.004160730546342',
               '-3.906435255815229', '-3.9585359373191573',
               '-4.121749682409465', '-4.311274260473334', '-4.475906614947818',
               '-4.506308325127962', '-5.0446571567547585',
               '-5.1304167685585265', '-5.1784094787560635',
               '-5.406007568494866', '-5.54071721843284', '-5.620706678454393',
               '-6.016276738559553', '-6.176169724335034', '-6.749604800157616',
               '-9.048408747110315', '-9.87594779908127', '-9.887616220510456',
               '-9.8910162081357', '-9.910261629446035', '-9.91316985348034',
               '-9.944786934999167']
})

相关问题 更多 >