Reinforcement Learning (RL) is a unique branch of Machine Learning where an “agent” learns to make decisions by performing actions in an environment to maximize a cumulative reward. Unlike Supervised Learning, there is no “answer key.” The agent learns from trial and error.
To understand RL, you must understand the interaction loop between these components:
The goal of the agent is not just to get the immediate reward, but the Maximum Cumulative Reward over time.
MDP is the mathematical framework used to describe the RL environment. It assumes the Markov Property: “The future is independent of the past, given the present.”
Essentially, you don’t need the history of how the agent got to the current state; the current state $s$ contains all the information needed to make the next decision.
Q-Learning is a “Value-Based” algorithm. It uses a Q-Table to store the “Quality” (Q-value) of an action taken in a specific state.
In the real world, the number of possible states is often too large for a table (e.g., a video game screen has millions of pixel combinations).
DQN replaces the Q-Table with a Neural Network.