[Feature Request] independently configurable learning rates for actor and critic

### 🚀 Feature

independently configurable learning rates for actor and critic in AC-style algorithms

### Motivation

In literature the actor is often configured to learn slower, such that the critics responses are more reliable. At least it would be nice if i could allow my hyperparameter optimizer to decide which learning rates he wants to use for actor or critic. 

### Pitch
https://github.com/DLR-RM/stable-baselines3/blob/65100a4b040201035487363a396b84ea721eb027/stable_baselines3/ddpg/ddpg.py#L12-L26


### Additional context

https://spinningup.openai.com/en/latest/algorithms/ddpg.html#documentation-pytorch-version

	class DDPG(TD3):
	"""
	Deep Deterministic Policy Gradient (DDPG).

	Deterministic Policy Gradient: http://proceedings.mlr.press/v32/silver14.pdf
	DDPG Paper: https://arxiv.org/abs/1509.02971
	Introduction to DDPG: https://spinningup.openai.com/en/latest/algorithms/ddpg.html

	Note: we treat DDPG as a special case of its successor TD3.

	:param policy: The policy model to use (MlpPolicy, CnnPolicy, ...)
	:param env: The environment to learn from (if registered in Gym, can be str)
	:param learning_rate: learning rate for adam optimizer,
	the same learning rate will be used for all networks (Q-Values, Actor and Value function)
	it can be a function of the current progress remaining (from 1 to 0)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] independently configurable learning rates for actor and critic #338

🚀 Feature

Motivation

Pitch

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] independently configurable learning rates for actor and critic #338

Description

🚀 Feature

Motivation

Pitch

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions