Back to Catalog

Playing TicTacToe with Reinforcement Learning and OpenAI Gym

Learn how to create and teach an agent that never loses to play TicTacToe using a Reinforcement Learning algorithm called Temporal Difference Learning and Open AI Gym



IntermediateGuided Project


  • English


  • Data Science


  • IBM

Estimated Effort

  • 45 minutes
About This Guided Project
Reinforcement Learning is a different style of machine learning different from supervised and unsupervised learning. It is learning what to do through trial and error. Reinforcement learning is also an excellent method used to train robots to play games. OpenAI Gym is a python library that standardizes the interaction between Agent/User/Robot and the environment so you can interact with a variety of Gym environments. Reinforcement Learning and OpenAI Gym is a great combination that can be used to create games with an AI aspect.

In this Guided Project, you will learn how to interact with the OpenAI Gym environment. We will be working with a custom environment created to play TicTacToe so you will also learn how to install custom environments. Additionally, we will learn about Reinforcement Learning and the algorithm Temporal Difference Learning, and how to implement an Agent using Temporal Difference Learning to play TicTacToe. Finally, we will play TicTacToe with our trained agent and environment and see an example of a TicTacToe game with a graphical user interface.

Learn by Doing

A guided project is a hands-on tutorial designed to help you learn a particular technology by doing a real project. It includes step-by-step instructions with explanations, examples, and exercises that can be followed and practiced in a lab environment.

Hands-on skills are highly sought out by employers when determining job readiness. Guided projects are interactive, on-demand and will equip you with practical abilities that can be applied on the job! 

A Look at the Project Ahead

Once you have completed this project, you'll be able to:
  • Install a custom OpenAI Gym environment
  • Work with an OpenAI Gym environment and the TicTacToe environment
  • Explain what Reinforcement Learning is
  • Explain what Temporal Difference Learning is
  • Create an agent that uses Temporal Difference Learning to play TicTacToe
  • Train and Test the agents using the TicTacToe environment
  • Play some games against the trained agent

What You’ll Need  

Just a web browser and Python programming knowledge are required!
Everything else is provided to you via the IBM Skills Network Labs environment, where you will have access to OpenAI Gym which is a tool that we offer as part of the IBM Skills Network Labs environment. This platform works best with current versions of Chrome, Edge, Firefox, Internet Explorer or Safari.

Your Instructor

Azim Hirjani, IBM
Azim HirjaniAuthor
Arnav ShahAuthor
15 y/o SWD Intern @ IBM | ML researcher.