-
Notifications
You must be signed in to change notification settings - Fork 2k
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedHelp from contributors is welcomedHelp from contributors is welcomed
Description
🚀 Feature
independently configurable learning rates for actor and critic in AC-style algorithms
Motivation
In literature the actor is often configured to learn slower, such that the critics responses are more reliable. At least it would be nice if i could allow my hyperparameter optimizer to decide which learning rates he wants to use for actor or critic.
Pitch
stable-baselines3/stable_baselines3/ddpg/ddpg.py
Lines 12 to 26 in 65100a4
| class DDPG(TD3): | |
| """ | |
| Deep Deterministic Policy Gradient (DDPG). | |
| Deterministic Policy Gradient: http://proceedings.mlr.press/v32/silver14.pdf | |
| DDPG Paper: https://arxiv.org/abs/1509.02971 | |
| Introduction to DDPG: https://spinningup.openai.com/en/latest/algorithms/ddpg.html | |
| Note: we treat DDPG as a special case of its successor TD3. | |
| :param policy: The policy model to use (MlpPolicy, CnnPolicy, ...) | |
| :param env: The environment to learn from (if registered in Gym, can be str) | |
| :param learning_rate: learning rate for adam optimizer, | |
| the same learning rate will be used for all networks (Q-Values, Actor and Value function) | |
| it can be a function of the current progress remaining (from 1 to 0) |
Additional context
https://spinningup.openai.com/en/latest/algorithms/ddpg.html#documentation-pytorch-version
Webbah
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedHelp from contributors is welcomedHelp from contributors is welcomed