In 2017, Libratus, an AI robot, beat four of the best poker players in the world at a Pittsburgh casino. As a result, it won $1.5 million in chips.

Libratus took on four professional poker players in January 2017, easily defeating them and winning nearly all the chips in play. Then, in November 2017, Noam Brown and Tuomas Sandholm's article, "Safe and Nested Endgame Solving for Imperfect-information Games," won the NIPS 2017 Best Thesis Award.

Overview

While Libratus was entirely new, it was the nominal successor to Claudico. Its name, like its predecessor, is a Latin phrase that means 'balanced.' However, Libratus required more than 15 million core hours of processing, whereas Claudico only needed 2-3 million. Researchers used the new "Bridges supercomputer" at the Pittsburgh Supercomputing Center to do the calculations.

Libratus, according to Professor Tuomas Sandholm, does not have a definite built-in strategy but rather an algorithm that computes the strategy. The methodology is a novel variation of counterfactual regret minimization, known as the CFR+ method. Oskari Tammelin invented the CFR+ technique in 2014. In addition to CFR+, Libratus used a new technique created by Sandholm and his PhD student, Noam Brown, for the problem of endgame solving. Their new method replaces the previous de facto standard in Poker programming, known as "action mapping."

Methodology 

The Libratus algorithm is a hybrid of Game Theory and Operations Research in terms of technique. The theoretical foundation of reinforcement learning consists of the Markov decision-making process and dynamic planning. Although the origins are distinct, the two will eventually collide.

Furthermore, Libratus was the first team to reach $1,000,000 by the 16th day of the competition. At the end of that day, it had more chips than the human team by $1,194,402. 

How is it trained?

Libratus relies on three interdependent systems, illustrating that modern AI is not powered by a single technology but by several. Deep neural networks garner the most significant attention today, and for a good reason: they power image identification, translation, and search at some of the world's largest technology companies. However, the success of neural networks has also given new life to many other AI techniques enabling machines to imitate and surpass human abilities.

Libratus, for one, did not use neural networks. Instead, it relied heavily on a sort of artificial intelligence known as reinforcement learning, a trial-and-error technique. In essence, it repeatedly competed against itself. Google's DeepMind lab employed reinforcement learning to create AlphaGo, the machine that won the ancient game of Go ten years ahead of schedule. However, there is a significant difference between the two systems. AlphaGo learnt the game by analyzing 30 million Go moves made by human players and then honed its skills by playing against itself. In contrast, Libratus was self-taught.

Conclusion

Developing an AI system such as Libratus is one way AI transforms our understanding of human intelligence. While Libratus' first application was to play poker, its creators have much bigger plans for AI. The investigators created the AI to learn any game or situation in which imperfect information is available and "opponents" may withhold facts or participate in fraud. As a result, Sandholm and his colleagues propose applying the approach to other real-world problems such as cybersecurity, commercial negotiations, and medical planning.

Image source - Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE