Reinforcement Learning Tic Tac Toe with Value Function
A reinforcement learning algorithm for agents to learn the tic-tac-toe, using the value function
machine-learning reinforcement-learning javascript tutorial article code

At any progression state except the terminal stage (where a win, loss or draw is recorded), the agent takes an action which leads to the next state, which may not yield any reward but would result in the agent a move closer to receiving a reward.

The value function is the algorithm to determine the value of being in a state, the probability of receiving a future reward.

The value of each state is updated reversed chronologically through the state history of a game, with enough training using both explore and exploit strategy, the agent will be able to determine the true value of each state in the game.

