There are many algorithms presenting in RL in a very intuitive way, but looks a bit heuristic. While re-reading Reinforcement Learning as an attempt to get rid of that heuristic feeling, I’ve tried to digest it under an optimization perspective. And well, I realized I couldn’t make any connection whatsoever from optimization understanding to any algorithm presenting in RL.

So this is an attempt to make thing more concrete under a somewhat first principle view.

The note is currently very unorganized. Link