# Lunar Lander Reinforcement Learning Agent

# Introduction

The OpenAI Gym provides a set of environments for training
a reinforcement learning agent. For this project, I trained an agent for
the *Lunar-Lander-v2* environment. The environment consists of a model
of a lunar lander that must learn to land on a target landing pad.
The pad is represented by the coordinates $(0,0)$. The lander receives a
reward for approaching the target with a speed of $0$. The lander is
rewarded $100$ points for landing and $-100$ points for crashing. Each
lege that is on the ground is rewarded $10$ points. Using the engine has
a cost of $-0.3$ per use. The lander receives an eight-dimensional
input, consisting of
$[ x, y, \theta, \dot x, \dot y, \dot \theta, l, r]$ where $x$ and $y$
are the coordinates from the landing pad, $\theta$ is the angular
rotation, $\dot z$ represents the derivative of $z$ and $l$ and $r$ are
boolean values $(0/1)$ that represents signal if each leg is on the
ground. The lander has a choicer of four actions:

Do Nothing

Fire Left Enginge

Fire Right Engine

Fire Main Engine

An agent is considered successful if it scores an average of 200 points or more over the last 100 episodes. Here, I use a Deep-Q Learning algorithm called Double DQN to train a lander that achieves an average score of 272 over 100 episodes.