Gerald Tesauro created the computer backgammon game TD-Gammon in 1992 at IBM's Thomas J. Watson Research Center. It gets its name because it is an artificial neural net trained using a type of temporal-difference learning known as TD-lambda.

Based on the reinforcement learning algorithm, TD-Gammon is a neural network that can teach itself how to play backgammon by playing against itself and learning from the results.

The creation of TD-Gammon was partially accidental. Tesauro experimented with "knowledge-free" networks, which he trained using the TD(lambda) method through self-play and included no backgammon attributes other than a raw representation of the board position. The first games played by the networks against themselves were random and without knowledge. After playing tens of thousands of games, they eventually reached a strength comparable to Neurogammon, with its hand-crafted features and training examples.

Play-and-learn algorithm

During a game, TD-Gammon looks at each legal move and how the system could answer it (two-ply lookahead). It then feeds each possible board position into its evaluation function and picks the move that leads to the board position with the highest score. In this way, TD-Gammon is the same as almost every other program for a computer board game. The new thing about TD-Gammon was how it learned to do its evaluation function.

Image source: Ancient games

  • TD-Gammon learns by updating the weights in its neural net after each turn. This approach is "temporal-difference learning" because it reduces the difference between how it rated board positions in previous turns and how it rated them in the current turn. A board position's score is a set of four numbers that show how likely each possible game result is to happen, based on what the program thinks:
  • White always wins.
  • Black always wins.
  • White always wins a gammon.
  • Black always wins a gammon.

Instead of relying on its evaluation of the board position, the algorithm compares the final board position to the actual result of the game.

Training experiments

TD-Gammon was initially programmed without any "knowledge" initially, " unlike previous neural-net backgammon programs like Neurogammon, which Tesauro wrote. 

TD-Gammon reached a level of play similar to that of an intermediate-level human backgammon player when the researchers first tested it. Researchers did this test with only a raw board encoding and no human-designed features.

Even though TD-Gammon found clever features on its own, Tesauro wondered if the game could be made better by adding features like Neurogammon's. The self-training TD-Gammon program with features made by experts quickly beat out all other computer backgammon programs. After about 1,500,000 self-play games with 80 hidden units, it stopped getting better.

Advancements

The only way TD-Gammon learned was through self-play, not by being taught. This learning allowed it to try out strategies humans hadn't thought of or had wrongly ruled out. Its success with unconventional methods made a big difference in the backgammon world.

In the first play, for example, the system thought that if White rolled 2-1, 4-1, or 5-1, he should move one checker from point 6 to point 5. This method, called "slotting," trades the chance of getting hit for the opportunity to take an aggressive position. TD-Gammon found that the safer move of 24-23 was the best one. Players in the tournament started trying out TD-move Gammon's and found it worked. After a few years, slotting was no longer in tournaments, but in 2006, it returned for 2-1.

Conclusion

TD-Gammon played a little worse than the best human backgammon players at the time. However, it looked at strategies that humans hadn't thought of before and led to improvements in how we should play backgammon.

Furthermore, the good positional play of TD-Gammon was sometimes caused by bad endgame play. Regarding the endgame, you need to be more analytical and sometimes think a long way ahead. Because TD-Gammon could only look two moves ahead, it could only do so much in this part of the game. Gammon's weaknesses were the opposite of symbolic artificial intelligence programs and most computer software. It was good at things that require a "feel" but bad at systematic analysis.

Image source: Unsplash

Want to publish your content?

Publish an article and share your insights to the world.

Get Published Icon
ALSO EXPLORE