March 13 Technical Journal
Tic Tac Toe Data:
We first planned on spending this two weeks looking into datasets for tic tac toe and processing it as a pre-trial for our dots and boxes. What we found and did though, I though was far more insightful. Initially, I was looking around for datasets and found from UVA a dataset for tic tac toe intermediate game states. We thought this could be something as the UCI dataset that we had first come across only gave endgame states. An image of the UVA dataset is posted below. Discussing with Mr. Lee, though, we could not decipher what the non-zero/one numbers represented and decided to move on looking, a sort of dead end. We inferred that -1 and 1 were X and O while 0 was blank. We would eventually later find another way to gather datasets (see Tic Tac Toe Q Learning below).
![]() |
Part of Tic Tac Toe intermediate game states data |
Tic Tac Toe Monte Carlo:
While we had been looking into datasets, it turns out that Will was experimenting with the Monte Carlo search tree for tic tac toe (an alternative to Q learning). An image of his code is attached below and is in the process of a game. We spent a period discussing the Monte Carlo search tree as it seemed quite simple through Will's code. This, however, contradicted what the grad students had told us, that the Monte Carlo Search Tree was too complicated. I was also quite surprised to learn that the Monte Carlo Search Tree had many similarities with the genetic algorithms that I had completed a concept map for the prior week. The algorithm went through generations and tested strategies by comparing generations.
After getting stuck on the tic tac toe data search, we turned to another discovery along the way of a Dots and Boxes Q Learning program. We cleaned it up and got it to run on our computers though my computer had some issues with importing. The Dots and Boxes Q Learning is on the left hand side of the picture. We plan on using this as a model for training our own dots and boxes game. After getting the dots and boxes game to work, we looked to see if there was a tic tac toe variant and indeed found one. It is pictured on the right hand side of the picture below. While this program did not have us actually playing the game, it seemed to be a conceptual perfect match with the dots and boxes program. While the dots and boxes implemented a q-learning taught bot (not learning) for us to play with, the tic tac toe was rather the computer playing against itself (the learning part).
The output in the tic tac tow was the percentage of wins by X, percentage of wins by O, and percentage of ties. The computer played thousands of games with itself in around two to three minutes and approached 100% ties. This meant that the computer was optimizing as neither X nor O sides of the computer could win. We also created an excel sheet and linked the tic tac toe training with the excel sheet, creating a CSV file with sequences like XXOOX. I also worked a bit with Edmond, Will, and Puja on the code for our dots and boxes game, helping to debug the game and fix crash situations. I think as the game is near completion and the learning algorithms seem to be coming together, it would be better for the group to be reunited.
![]() |
Dots and Boxes game (left) trained by Q learning bot; Tic Tac Toe Q learning process (right) after 10,000 games |
- explanations of the data you are collecting
- We hit a dead end with established datasets, and are instead using the Tic Tac Toe Q Learning program to create a CSV file of game states. As the algorithm trains itself, the game states are uploaded to the file. (See Dots and Boxes and Tic Tac Toe Q Learning)
- what the data is/how it was collected
- The data is a set of tic tac toe game states and was collected through running the Tic Tac Toe Q Learning algorithm and linking it with an excel CSV file.
- why is it useful
- The data shows the different strategies the computer is trying and can consequently show the learning process. It is helpful for us in that it shows us what we should aim for with a dots and boxes Q learning game.
- how did you need to change it/clean it up for your project
- We needed to create the CSV file for the data to be collected and we cleaned up the code for this tic tac toe Q learning algorithm and the dots and boxes Q learning algorithm (imports, game runs, etc)
Comments
Post a Comment