What is the bottleneck and weakness of AlphaGo?

Posted 2020-06-22 Programming.log - a place to k

tags:

篇首语：本文由小常识网(cha138.com)小编为大家整理，主要介绍了What is the bottleneck and weakness of AlphaGo?相关的知识，希望对你有一定的参考价值。

Is self-play a bottleneck in theory for AlphaGo to improve? My perspective is not! The real problem with AlphaGo (and any other AI and human) is the state space of Go is much larger than the state space of its neural network, therefore no matter how we train it, it still suffers from the underfitting problem. Which means there is always a problem with its value network and policy network that, when some cases are trained very well, other cases pop up.

But in terms of supervised learning vs unsupervised learning, they only differ in training set, which means they make AlphaGo‘s neural network bias to a certain style and handles certain cases very well. Unsupervised learning can provide all the information that supervised learning can provide, imagine the board is 9*9, unsupervised learning is absolutely enough to provide a good training set. So, unsupervised learning is not really a bottleneck in theory, but in practice supervised learning makes AlphaGo bias to a certain style, and have a better chance to win a certain opponent. But when its neural network gets larger to be able to accommodates more states, the value of supervised learning is also decreased.

Because of the underfitting problem, the value network may get wrong on who‘s winning the game on some states which may look simple to human. This is why AlphaGo use MCTS to rollout for many steps for validation, only when after playing down some steps, the game is still in favor of AlphaGo, the original state is considered truly good. So, AlphaGo is really a mixture of "intuition + logic", this is very similar to human.

This design makes it very hard to catch AlphaGo‘s weakness, but it does exists. Based on the analysis above, the weakness of AlphaGo is clear to me now: its value network gets wrong on not only one state, but also many steps following the state. Although the probability is very low, but it did happened in Game 4. Brilliant Lee Sedol!

以上是关于What is the bottleneck and weakness of AlphaGo?的主要内容，如果未能解决你的问题，请参考以下文章

What is an ISAPI Extension?

Terminology: What is an Object?

Data type confusion: what is an int(11)?

What is the rbenv?

什么是操作系统？WHAT IS AN OPERATING SYSTEM?

The truth is what it is, not what you see