Hello there.
I have been changing hyperparameters for a while and there are no signs of getting better results.
https://preview.redd.it/ffcwkk2mx9wc1.png?width=930&format=png&auto=webp&s=af5d9e10bf70a9fe95ce5ae3b26730be96990e1a
~DQN~
vec_env: VecEnv = make_vec_env("ALE/Pong-ram-v5", n_envs=4) vec_env = VecFrameStack(vec_env, n_stack=4) policy_kwargs = dict(activation_fn=torch.nn.ReLU, net_arch=[64, 32]) model = DQN( "MlpPolicy", vec_env, learning_rate=0.0001, target_update_interval = 1000, train_freq= 4, buffer_size = 10000, learning_starts= 5000, batch_size = 32, exploration_fraction = 0.1, exploration_final_eps = 0.01, gamma=0.99, policy_kwargs=policy_kwargs ) model.learn(total_timesteps=2000000)
~A2C~
vec_env: VecEnv = make_vec_env("ALE/Pong-ram-v5", n_envs=4) vec_env = VecFrameStack(vec_env, n_stack=4) policy_kwargs = dict(activation_fn=torch.nn.ReLU, net_arch=[dict(pi=[32, 16], vf=[32, 16])], ortho_init=True) model = A2C( "MlpPolicy", vec_env, learning_rate=1.4e-5, n_steps=512, gamma=0.983, gae_lambda=0.95, max_grad_norm = 0.36, ent_coef=0.01, policy_kwargs=policy_kwargs ) model.learn(total_timesteps=2000000)
~PPO~
vec_env: VecEnv = make_vec_env("ALE/Pong-ram-v5", n_envs=4) vec_env = VecFrameStack(vec_env, n_stack=4) policy_kwargs = dict(activation_fn=torch.nn.ReLU, net_arch=dict(pi=[32, 32], vf=[32, 32])) model = PPO("MlpPolicy", vec_env, learning_rate=2.5e-4, n_steps=128, batch_size=256, n_epochs=4, gamma=0.99, gae_lambda=0.95, clip_range=0.3, ent_coef=0.1, vf_coef=0.5, policy_kwargs=policy_kwargs, device="cpu") model.learn(total_timesteps=2000000)
Also with this exact setting I used diffrent architectures e.g. {[64, 32, 16] [64, 32, 16]}, {[32, 16], [32, 16]} and more. I also have tried with game as image and CnnPolicy.
submitted by /u/ufoludek3000
[link] [comments]