Talks and tutorials on RL

Introduction to Reinforcement Learning with Function Approximation: A tutorial given at NIPS 2015 by Richard Sutton.
Policy Search: Methods and Applications: A tutorial given at ICML 2015 by Jan Peters and Gerhard Neumann.
Representation and Learning Methods for Complex Outputs: Talk NIPS 2014 by Richard Sutton.

Value and Q-value recursion

There are two forms the expected reward for a given state is encoded:

v-function: \(V^{\pi}(s) = \mathbb{E}_{\pi} \left\{ \sum\limits_{k=0}^{\infty} \gamma^k r_{t+k+1} \lvert s_t = s \right\}\)
q-function: \(Q^{\pi}(s,a) = \mathbb{E}_{\pi} \left\{ \sum\limits_{k=0}^{\infty} \gamma^k r_{t+k+1} \lvert s_t = s, a_t = a \right\}\)

The v-function is the expected reward given a state whilst the q-function is for a state and action. The recursive aspect of both these two functions can be derived from first principal and it can be shown that the v-function is a function of the q-function.

See RVQ.pdf for the derivation of the recursion and the link between both functional forms.

See RL_Solutions_Chap3.pdf for the effect of sign and constants in the reward function.

Policy Gradient Theorem

\[a = \pi(s;\theta)\]

We want to find an expression for \(\Delta\theta\) which uses an estimator of the expected reward such as the action-value or advantage function.

Policy Gradient Methods for Reinforcement Learning with Function Approximation Proves that the gradient of a policy be derived when using a function approximator for either an action-value or advantage function.

The key is to able to find an unbiased estimage of the gradient \(\Delta\theta\)

Reinforcement learning notes

Talks and tutorials on RL

Value and Q-value recursion

Policy Gradient Theorem