DEEP REINFORCEMENT LEARNING ALGORITHMS

This repository contains PyTorch implementations of various Deep Reinforcement Learning algorithms and a comparison of their results.

Algorithms

The following algorithms have been implemented so far:

REINFORCE/Vanilla Policy Gradient (VPG): OpenAI's Spinning Up
Deep Q Learning (DQN): Mnih et al. 2013
Trust Region Policy Gradient (TRPG): Spinning Up / (Schulman et al 2015b) *
Proximal Policy Optimization (PPO): Spinning Up / (Schulman et al 2017)
Soft Actor Critic (SAC) for discrete environments: Spinning Up (continuous version) / (Christodoulou 2019)

The policy gradient algorithms 1, 3, and 4 are using Generalized Advantage Estimation (Schulman et al 2015a)

* the implementation of TRPG occasionally fails during learning due to numerical issues. Results are from successful runs only

Results

Space Invaders

This is the result of the DQN agent on the gymnasium Atari Space Invader environment. The training setup is similar to the original paper by (Mnih et al 2013b): Agent observations are the last four frames which are scaled to 84x84 grayscale. The exact hyperparameters can be found in train_DQN_for_Space_Invaders.py and are also similar to (Mnih et al 2013b). However, I only trained for 5 million steps as opposed to 50 million in the original paper.

Cartpole

The algorithms were trained on OpenAI Gym's implementation of the Cart Pole Environment. Each agent was trained for 400 training steps with episodes automatically terminating after 200 timesteps. For the exact hyperparameters see the training scripts (train_X_for_cartpole.py). The y value of the learning curves represents the mean score of running the algorithm 5 times and the shaded area around the learning curve corresponds to the standard deviation. The following curves were smoothed using a moving average with a window size of 4.

Note that just looking at the learning curves is not sufficient to compare two algorithms. Firstly, the same amount of training steps does not necessarily require the same amount of computing power and training time. For example, DQN can do a training step after every timestep after an initial period of exploration. On the other hand, VPN must complete multiple full episodes for every training step. Furthermore, no hyperparameter tuning was done before running the algorithms. Doing so might significantly improve performance. Hence, the learning curves only serve to demonstrate the correct implementation of the algorithms and their learning behaviour.

Acknowledgements

The implementations of the algorithms in this repository are my own, but it was immensely useful to look at the Spinning Up repository and Deep Reinforcement Learning Algorithms in PyTorch when I was stuck or looking for things to improve.

This Medium article by Rohan Tangri helped me understand Generalized Advantage Estimation.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
agents		agents
logs		logs
results		results
tests		tests
utilities		utilities
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
mypy.ini		mypy.ini
noxfile.py		noxfile.py
plot_results.py		plot_results.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
token.txt		token.txt
train_DQN_for_Space_Invaders.py		train_DQN_for_Space_Invaders.py
train_DQN_for_cartpole.py		train_DQN_for_cartpole.py
train_PPO_for_cartpole.py		train_PPO_for_cartpole.py
train_SAC_for_Space_Invaders.py		train_SAC_for_Space_Invaders.py
train_SAC_for_cartpole.py		train_SAC_for_cartpole.py
train_TRPG_for_cartpole.py		train_TRPG_for_cartpole.py
train_VPG_for_cartpole.py		train_VPG_for_cartpole.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DEEP REINFORCEMENT LEARNING ALGORITHMS

Algorithms

Results

Space Invaders

Cartpole

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Felhof/Deep-Reinforcement-Learning-Algorithms

Folders and files

Latest commit

History

Repository files navigation

DEEP REINFORCEMENT LEARNING ALGORITHMS

Algorithms

Results

Space Invaders

Cartpole

Acknowledgements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages