Getting ValueError:使用序列设置数组元素

2024-05-17 09:53:22 发布

您现在位置:Python中文网/ 问答频道 /正文

我第一次尝试FinRL,是在this示例笔记本的基础上准备my own notebook。 当我运行this line时:

trained_a2c = agent.train_model(model=model_a2c, 
                                tb_log_name='a2c',
                                total_timesteps=50000)

仅在第一次尝试时,它才能够正确地登录到tensorboard,但随后由于ValueError而失败:

Logging to tensorboard_log/a2c\a2c_1
------------------------------------
| time/                 |          |
|    fps                | 633      |
|    iterations         | 100      |
|    time_elapsed       | 0        |
|    total_timesteps    | 500      |
| train/                |          |
|    entropy_loss       | -1.41    |
|    explained_variance | -0.087   |
|    learning_rate      | 0.0002   |
|    n_updates          | 99       |
|    policy_loss        | 0.003    |
|    std                | 0.994    |
|    value_loss         | 0.0812   |
------------------------------------
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars

The above exception was the direct cause of the following exception:

ValueError                                Traceback (most recent call last)
<ipython-input-10-9b19b057dbd1> in <module>
----> 1 trained_a2c = agent.train_model(model=model_a2c, 
      2                                 tb_log_name='a2c',
      3                                 total_timesteps=50000)

c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\finrl\model\models.py in train_model(self, model, tb_log_name, total_timesteps)
    122 
    123     def train_model(self, model, tb_log_name, total_timesteps=5000):
--> 124         model = model.learn(total_timesteps=total_timesteps, tb_log_name=tb_log_name)
    125         return model
    126 

c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\a2c\a2c.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    188     ) -> "A2C":
    189 
--> 190         return super(A2C, self).learn(
    191             total_timesteps=total_timesteps,
    192             callback=callback,

c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
    225         while self.num_timesteps < total_timesteps:
    226 
--> 227             continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
    228 
    229             if continue_training is False:

c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py in collect_rollouts(self, env, callback, rollout_buffer, n_rollout_steps)
    166                 clipped_actions = np.clip(actions, self.action_space.low, self.action_space.high)
    167 
--> 168             new_obs, rewards, dones, infos = env.step(clipped_actions)
    169 
    170             self.num_timesteps += env.num_envs

c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py in step(self, actions)
    161         """
    162         self.step_async(actions)
--> 163         return self.step_wait()
    164 
    165     def get_images(self) -> Sequence[np.ndarray]:

c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py in step_wait(self)
     41     def step_wait(self) -> VecEnvStepReturn:
     42         for env_idx in range(self.num_envs):
---> 43             obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
     44                 self.actions[env_idx]
     45             )

ValueError: setting an array element with a sequence.

我试着调试一下。当在dummy_vec_env.step_wait()内调用env_stocktrading.step()时,它主要是returns{}作为numpy.float64。但一旦它以numpy.ndarray的形式返回,就会出现上述错误。在我的例子中,它返回[0.261415 0.261415],然后将异常while assigning抛出给self.buf_rews[env_idx]。我在this line之前添加了print语句:

print(str(type(self.reward)) + " - " + str(self.reward)  + " - " + str(type(self.reward_scaling)) + " - " + str(self.reward_scaling))    

错误之前的最后四个打印输出是:

<class 'numpy.float64'> - -973.2887000000046 - <class 'float'> - 0.0001
<class 'numpy.float64'> - -63.092250000016065 - <class 'float'> - 0.0001
<class 'numpy.float64'> - -1112.7334499999997 - <class 'float'> - 0.0001
<class 'numpy.ndarray'> - [-55.75 -55.75] - <class 'float'> - 0.0001

似乎奖赏在某种程度上变成了ndarray。我猜不出为什么会这样(刚从这个开始)😶). 有人能给我一些快速的指示吗

是否存在版本不兼容问题?但是,它如何在第一个tensorboard日志中正常工作,然后失败。(This是我当前的设置。)

PS:Here是我的笔记本


Tags: nameinselfenvlogmodelstepeval