Deep RL Course documentation
Hands-on: advanced Deep Reinforcement Learning. Using Sample Factory to play Doom from pixels
Hands-on: advanced Deep Reinforcement Learning. Using Sample Factory to play Doom from pixels
Unit 8 Part 2: Advanced Deep Reinforcement Learning. Using Sample Factory to play Doom from pixels

In this notebook, we will learn how to train a Deep Neural Network to collect objects in a 3D environment based on the game of Doom, a video of the resulting policy is shown below. We train this policy using Sample Factory, an asynchronous implementation of the PPO algorithm.
Please note the following points:
Sample Factory is an advanced RL framework and only functions on Linux and Mac (not Windows).
The framework performs best on a GPU machine with many CPU cores, where it can achieve speeds of 100k interactions per second. The resources available on a standard Colab notebook limit the performance of this library. So the speed in this setting does not reflect the real-world performance.
Benchmarks for Sample Factory are available in a number of settings, check out the examples if you want to find out more.
from IPython.display import HTML
"""<video width="640" height="480" controls>
<source src=""
type="video/mp4">Your browser does not support the video tag.</video>"""
To validate this hands-on for the certification process, you need to push one model:
get a result of >= 5.
To find your result, go to the leaderboard and find your model, the result = mean_reward - std of reward
If you don’t find your model, go to the bottom of the page and click on the refresh button
For more information about the certification process, check this section 👉
Set the GPU 💪
- To accelerate the agent’s training, we’ll use a GPU. To do that, go to
Runtime > Change Runtime type

Hardware Accelerator > GPU

Before starting to train our agent, let’s study the library and environments we’re going to use.
Sample Factory
Sample Factory is one of the fastest RL libraries focused on very efficient synchronous and asynchronous implementations of policy gradients (PPO).
Sample Factory is thoroughly tested, used by many researchers and practitioners, and is actively maintained. Our implementation is known to reach SOTA performance in a variety of domains while minimizing RL experiment training time and hardware requirements.

Key features
- Highly optimized algorithm architecture for maximum learning throughput
- Synchronous and asynchronous training regimes
- Serial (single-process) mode for easy debugging
- Optimal performance in both CPU-based and GPU-accelerated environments
- Single- & multi-agent training, self-play, supports training multiple policies at once on one or many GPUs
- Population-Based Training (PBT)
- Discrete, continuous, hybrid action spaces
- Vector-based, image-based, dictionary observation spaces
- Automatically creates a model architecture by parsing action/observation space specification. Supports custom model architectures
- Designed to be imported into other projects, custom environments are first-class citizens
- Detailed WandB and Tensorboard summaries, custom metrics
- HuggingFace 🤗 integration (upload trained models and metrics to the Hub)
- Multiple example environment integrations with tuned parameters and trained models
All of the above policies are available on the 🤗 hub. Search for the tag sample-factory
How sample-factory works
Sample-factory is one of the most highly optimized RL implementations available to the community.
It works by spawning multiple processes that run rollout workers, inference workers and a learner worker.
The workers communicate through shared memory, which lowers the communication cost between processes.
The rollout workers interact with the environment and send observations to the inference workers.
The inferences workers query a fixed version of the policy and send actions back to the rollout worker.
After k steps the rollout works send a trajectory of experience to the learner worker, which it uses to update the agent’s policy network.

Actor Critic models in Sample-factory
Actor Critic models in Sample Factory are composed of three components:
- Encoder - Process input observations (images, vectors) and map them to a vector. This is the part of the model you will most likely want to customize.
- Core - Intergrate vectors from one or more encoders, can optionally include a single- or multi-layer LSTM/GRU in a memory-based agent.
- Decoder - Apply additional layers to the output of the model core before computing the policy and value outputs.
The library has been designed to automatically support any observation and action spaces. Users can easily add their custom models. You can find out more in the documentation.
ViZDoom is an open-source python interface for the Doom Engine.
The library was created in 2016 by Marek Wydmuch, Michal Kempka at the Institute of Computing Science, Poznan University of Technology, Poland.
The library enables the training of agents directly from the screen pixels in a number of scenarios, including team deathmatch, shown in the video below. Because the ViZDoom environment is based on a game the was created in the 90s, it can be run on modern hardware at accelerated speeds, allowing us to learn complex AI behaviors fairly quickly.
The library includes feature such as:
- Multi-platform (Linux, macOS, Windows),
- API for Python and C++,
- OpenAI Gym environment wrappers
- Easy-to-create custom scenarios (visual editors, scripting language, and examples available),
- Async and sync single-player and multiplayer modes,
- Lightweight (few MBs) and fast (up to 7000 fps in sync mode, single-threaded),
- Customizable resolution and rendering parameters,
- Access to the depth buffer (3D vision),
- Automatic labeling of game objects visible in the frame,
- Access to the audio buffer
- Access to the list of actors/objects and map geometry,
- Off-screen rendering and episode recording,
- Time scaling in async mode.
We first need to install some dependencies that are required for the ViZDoom environment
Now that our Colab runtime is set up, we can start by installing the dependencies required to run ViZDoom on linux.
If you are following on your machine on Mac, you will want to follow the installation instructions on the github page.
# Install ViZDoom deps from
apt-get install build-essential zlib1g-dev libsdl2-dev libjpeg-dev \
nasm tar libbz2-dev libgtk2.0-dev cmake git libfluidsynth-dev libgme-dev \
libopenal-dev timidity libwildmidi-dev unzip ffmpeg
# Boost libraries
apt-get install libboost-all-dev
# Lua binding dependencies
apt-get install liblua5.1-dev
Then we can install Sample Factory and ViZDoom
- This can take 7min
pip install sample-factory pip install vizdoom
Setting up the Doom Environment in sample-factory
import functools
from sample_factory.algo.utils.context import global_model_factory
from sample_factory.cfg.arguments import parse_full_cfg, parse_sf_args
from sample_factory.envs.env_utils import register_env
from sample_factory.train import run_rl
from sf_examples.vizdoom.doom.doom_model import make_vizdoom_encoder
from sf_examples.vizdoom.doom.doom_params import add_doom_env_args, doom_override_defaults
from sf_examples.vizdoom.doom.doom_utils import DOOM_ENVS, make_doom_env_from_spec
# Registers all the ViZDoom environments
def register_vizdoom_envs():
for env_spec in DOOM_ENVS:
make_env_func = functools.partial(make_doom_env_from_spec, env_spec)
register_env(, make_env_func)
# Sample Factory allows the registration of a custom Neural Network architecture
# See for more details
def register_vizdoom_models():
def register_vizdoom_components():
# parse the command line args and create a config
def parse_vizdoom_cfg(argv=None, evaluation=False):
parser, _ = parse_sf_args(argv=argv, evaluation=evaluation)
# parameters specific to Doom envs
# override Doom default values for algo parameters
# second parsing pass yields the final configuration
final_cfg = parse_full_cfg(parser, argv)
return final_cfg
Now that the setup if complete, we can train the agent. We have chosen here to learn a ViZDoom task called Health Gathering Supreme
The scenario: Health Gathering Supreme

The objective of this scenario is to teach the agent how to survive without knowing what makes it survive. The Agent know only that life is precious and death is bad so it must learn what prolongs its existence and that its health is connected with survival.
The map is a rectangle containing walls and with a green, acidic floor which hurts the player periodically. Initially there are some medkits spread uniformly over the map. A new medkit falls from the skies every now and then. Medkits heal some portions of player’s health - to survive, the agent needs to pick them up. The episode finishes after the player’s death or on timeout.
Further configuration:
- Living_reward = 1
- 3 available buttons: turn left, turn right, move forward
- 1 available game variable: HEALTH
- death penalty = 100
You can find out more about the scenarios available in ViZDoom here.
There are also a number of more complex scenarios that have been create for ViZDoom, such as the ones detailed on this github page.
Training the agent
- We’re going to train the agent for 4000000 steps. It will take approximately 20min
## Start the training, this should take around 15 minutes
# The scenario we train on today is health gathering
# other scenarios include "doom_basic", "doom_two_colors_easy", "doom_dm", "doom_dwango5", "doom_my_way_home", "doom_deadly_corridor", "doom_defend_the_center", "doom_defend_the_line"
env = "doom_health_gathering_supreme"
cfg = parse_vizdoom_cfg(
argv=[f"--env={env}", "--num_workers=8", "--num_envs_per_worker=4", "--train_for_env_steps=4000000"]
status = run_rl(cfg)
Let’s take a look at the performance of the trained policy and output a video of the agent.
from sample_factory.enjoy import enjoy
cfg = parse_vizdoom_cfg(
argv=[f"--env={env}", "--num_workers=1", "--save_video", "--no_render", "--max_num_episodes=10"], evaluation=True
status = enjoy(cfg)
Now lets visualize the performance of the agent
from base64 import b64encode
from IPython.display import HTML
mp4 = open("/content/train_dir/default_experiment/replay.mp4", "rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
<video width=640 controls>
<source src="%s" type="video/mp4">
% data_url
The agent has learned something, but its performance could be better. We would clearly need to train for longer. But let’s upload this model to the Hub.
Now lets upload your checkpoint and video to the Hugging Face Hub
To be able to share your model with the community there are three more steps to follow:
1️⃣ (If it’s not already done) create an account to HF ➡
2️⃣ Sign in and get your authentication token from the Hugging Face website.
- Create a new token ( with write role

- Copy the token
- Run the cell below and paste the token
If you don’t want to use Google Colab or a Jupyter Notebook, you need to use this command instead: huggingface-cli login
from huggingface_hub import notebook_login
!git config --global credential.helper store
from sample_factory.enjoy import enjoy
hf_username = "ThomasSimonini" # insert your HuggingFace username here
cfg = parse_vizdoom_cfg(
status = enjoy(cfg)
Let’s load another model
This agent’s performance was good, but we can do better! Let’s download and visualize an agent trained for 10B timesteps from the hub.
#download the agent from the hub
python -m sample_factory.huggingface.load_from_hub -r edbeeching/doom_health_gathering_supreme_2222 -d ./train_dir
ls train_dir/doom_health_gathering_supreme_2222
env = "doom_health_gathering_supreme"
cfg = parse_vizdoom_cfg(
status = enjoy(cfg)
mp4 = open("/content/train_dir/doom_health_gathering_supreme_2222/replay.mp4", "rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
<video width=640 controls>
<source src="%s" type="video/mp4">
% data_url
Some additional challenges 🏆: Doom Deathmatch
Training an agent to play a Doom deathmatch takes many hours on a more beefy machine than is available in Colab.
Fortunately, we have have already trained an agent in this scenario and it is available in the 🤗 Hub! Let’s download the model and visualize the agent’s performance.
# Download the agent from the hub
python -m sample_factory.huggingface.load_from_hub -r edbeeching/doom_deathmatch_bots_2222 -d ./train_dir
Given the agent plays for a long time the video generation can take 10 minutes.
from sample_factory.enjoy import enjoy
env = "doom_deathmatch_bots"
cfg = parse_vizdoom_cfg(
status = enjoy(cfg)
mp4 = open("/content/train_dir/doom_deathmatch_bots_2222/replay.mp4", "rb").read()
data_url = "data:video/mp4;base64," + b64encode(mp4).decode()
<video width=640 controls>
<source src="%s" type="video/mp4">
% data_url
You can try to train your agent in this environment using the code above, but not on colab. Good luck 🤞
If you prefer an easier scenario, why not try training in another ViZDoom scenario such as doom_deadly_corridor
or doom_defend_the_center
This concludes the last unit. But we are not finished yet! 🤗 The following bonus section include some of the most interesting, advanced, and cutting edge work in Deep Reinforcement Learning.