我第一次尝试FinRL,是在this示例笔记本的基础上准备my own notebook。 当我运行this line时:
trained_a2c = agent.train_model(model=model_a2c,
tb_log_name='a2c',
total_timesteps=50000)
仅在第一次尝试时,它才能够正确地登录到tensorboard,但随后由于ValueError
而失败:
Logging to tensorboard_log/a2c\a2c_1
------------------------------------
| time/ | |
| fps | 633 |
| iterations | 100 |
| time_elapsed | 0 |
| total_timesteps | 500 |
| train/ | |
| entropy_loss | -1.41 |
| explained_variance | -0.087 |
| learning_rate | 0.0002 |
| n_updates | 99 |
| policy_loss | 0.003 |
| std | 0.994 |
| value_loss | 0.0812 |
------------------------------------
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
TypeError: only size-1 arrays can be converted to Python scalars
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-10-9b19b057dbd1> in <module>
----> 1 trained_a2c = agent.train_model(model=model_a2c,
2 tb_log_name='a2c',
3 total_timesteps=50000)
c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\finrl\model\models.py in train_model(self, model, tb_log_name, total_timesteps)
122
123 def train_model(self, model, tb_log_name, total_timesteps=5000):
--> 124 model = model.learn(total_timesteps=total_timesteps, tb_log_name=tb_log_name)
125 return model
126
c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\a2c\a2c.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
188 ) -> "A2C":
189
--> 190 return super(A2C, self).learn(
191 total_timesteps=total_timesteps,
192 callback=callback,
c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py in learn(self, total_timesteps, callback, log_interval, eval_env, eval_freq, n_eval_episodes, tb_log_name, eval_log_path, reset_num_timesteps)
225 while self.num_timesteps < total_timesteps:
226
--> 227 continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
228
229 if continue_training is False:
c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\common\on_policy_algorithm.py in collect_rollouts(self, env, callback, rollout_buffer, n_rollout_steps)
166 clipped_actions = np.clip(actions, self.action_space.low, self.action_space.high)
167
--> 168 new_obs, rewards, dones, infos = env.step(clipped_actions)
169
170 self.num_timesteps += env.num_envs
c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\common\vec_env\base_vec_env.py in step(self, actions)
161 """
162 self.step_async(actions)
--> 163 return self.step_wait()
164
165 def get_images(self) -> Sequence[np.ndarray]:
c:\users\crrma\.virtualenvs\stocks-pv8ke_ig\lib\site-packages\stable_baselines3\common\vec_env\dummy_vec_env.py in step_wait(self)
41 def step_wait(self) -> VecEnvStepReturn:
42 for env_idx in range(self.num_envs):
---> 43 obs, self.buf_rews[env_idx], self.buf_dones[env_idx], self.buf_infos[env_idx] = self.envs[env_idx].step(
44 self.actions[env_idx]
45 )
ValueError: setting an array element with a sequence.
我试着调试一下。当在dummy_vec_env.step_wait()
内调用env_stocktrading.step()
时,它主要是returns{numpy.float64
。但一旦它以numpy.ndarray
的形式返回,就会出现上述错误。在我的例子中,它返回[0.261415 0.261415]
,然后将异常while assigning抛出给self.buf_rews[env_idx]
。我在this line之前添加了print
语句:
print(str(type(self.reward)) + " - " + str(self.reward) + " - " + str(type(self.reward_scaling)) + " - " + str(self.reward_scaling))
错误之前的最后四个打印输出是:
<class 'numpy.float64'> - -973.2887000000046 - <class 'float'> - 0.0001
<class 'numpy.float64'> - -63.092250000016065 - <class 'float'> - 0.0001
<class 'numpy.float64'> - -1112.7334499999997 - <class 'float'> - 0.0001
<class 'numpy.ndarray'> - [-55.75 -55.75] - <class 'float'> - 0.0001
似乎奖赏在某种程度上变成了ndarray
。我猜不出为什么会这样(刚从这个开始)😶). 有人能给我一些快速的指示吗
是否存在版本不兼容问题?但是,它如何在第一个tensorboard日志中正常工作,然后失败。(This是我当前的设置。)
PS:Here是我的笔记本
目前没有回答
相关问题 更多 >
编程相关推荐