As Kahneman (2011) pointed out in his book “Thinking, fast and slow’’, we have two modes of thinking: fast and slow. For example, we do not need to think much about how to walk, how to eat; but we do need to think slowly for some complex tasks such as planing our travel routes.
In reinforcement learning, there are two main categories of methods: model-free and model based.
- Model-free methods: never learn task T and environment E explicitly. At the end of learning, agent knows how to act, but doesn’t explicitly know anything about the environment. Deep learning algorithms are model-free methods.
- Model-based methods: explicitly learn task T. (see model-based reasoning to get a sense of it.)
AlphaGo involves both model-free methods (Convolutional Neural Network (CNN)), and also model-based methods (Monte Carlo Tree Search (MCTS)). In fact, AlphaGo is pretty similar to how we humans think: involving both fast intuition (i.e., cost function by CNN) and also careful and slow thinking (i.e., MCTS).
Combining model-free and model-based methods should probably be the way to go for the solutions to many real-world problems (fast intuition + careful planing).
References:
Kahneman, Daniel. Thinking, fast and slow. Macmillan, 2011.