Lunar Lander Reinforcement Learning Agent
The OpenAI Gym provides a set of environments for training a reinforcement learning agent. For this project, I trained an agent for the Lunar-Lander-v2 environment. The environment consists of a model of a lunar lander that must learn to land on a target landing pad. The pad is represented by the coordinates $(0,0)$. The lander receives a reward for approaching the target with a speed of $0$. The lander is rewarded $100$ points for landing and $-100$ points for crashing. Each lege that is on the ground is rewarded $10$ points. Using the engine has a cost of $-0.3$ per use. The lander receives an eight-dimensional input, consisting of $[ x, y, \theta, \dot x, \dot y, \dot \theta, l, r]$ where $x$ and $y$ are the coordinates from the landing pad, $\theta$ is the angular rotation, $\dot z$ represents the derivative of $z$ and $l$ and $r$ are boolean values $(0/1)$ that represents signal if each leg is on the ground. The lander has a choicer of four actions:
Fire Left Enginge
Fire Right Engine
Fire Main Engine
An agent is considered successful if it scores an average of 200 points or more over the last 100 episodes. Here, I use a Deep-Q Learning algorithm called Double DQN to train a lander that achieves an average score of 272 over 100 episodes.