QLearning-MakingAGamePlayItself

Ian Vicino
Jul 9, 2023
5 min read

For my first foray into showcasing my code, I figured I would upload a fun piece of code. I made this code to test the Q-Learning algorithm, to see if I could get a game to play itself. I recorded the game playing itself on my instagram, so if you want to check out that video you can find it here: https://www.instagram.com/neurodude64/. If you want to learn more about how the Q-Learning algorithm works, check out this site: https://towardsdatascience.com/teaching-a-computer-how-to-play-snake-with-q-learning-93d0a316ddc0. Or search for it online, there are a bunch of tutorials out there.

To make the code below, I followed a tutorial and modified their code. Actually I think I may have combined multiple codes together, I don't remember fully. Either way, if you are getting started learning how to code something, don't be afraid of taking someone's code and manipulating it to solve you problem or teach you how it works. In fact, you can run my code, and use it to learn how Q-Learning works. That is how a developer grows as a developer, by challenging themselves. Don't reinvent the wheel, instead modify the wheel to work for you. Chang the colors, or make a bigger play area. Have fun with it. I will make a tutorial for python beginners later, but for now I just wanted to put some fun code out there for you all to play with. Enjoy!!!

(NOTE: The code may have been changed while pasting it on the page. If you want the unaltered code, check out my GitHub: https://github.com/ivicino/Block_game/blob/main/Block_game_AI.py)

import pygame
from pygame.locals import *
from collections import namedtuple
import numpy as np
from Block_game_AI import blocks

# Bugs:
# The starting location of the block is constantly on the game window... Need to figure out how to get that removed


#%% Constants

FPS = 7
WINDOWWIDTH = 200
WINDOWHEIGHT = 200
BLOCKSIZE = 20
assert WINDOWWIDTH % BLOCKSIZE == 0, "Window width must be a multiple of block size."
assert WINDOWHEIGHT % BLOCKSIZE == 0, "Window height must be a multiple of block size."
CELLWIDTH = int(WINDOWWIDTH / BLOCKSIZE)
CELLHEIGHT = int(WINDOWHEIGHT / BLOCKSIZE)

#             R    G    B
WHITE     = (255, 255, 255)
BLACK     = (  0,   0,   0)
RED       = (255,   0,   0)
GREEN     = (  0, 255,   0)
DARKGREEN = (  0, 155,   0)
DARKGRAY  = ( 40,  40,  40)

BGCOLOR = BLACK

 # start Block close to the top of the game window
 # Note: if I change the starting location, I will have to also change it in Block_game_AI
 
startx = 2*BLOCKSIZE
starty = 2*BLOCKSIZE

# if this gets changed, you also need to change tx and ty in Block_gam_AI.py
# This tx and ty are specifically for the pysical location, those in Block_gam_AI.py are for the drawing of the target
# tx and ty are the x and y coordinates of the target
tx = (BLOCKSIZE * 8)
ty = (BLOCKSIZE * 8)

# states:
environment_rows = WINDOWWIDTH 
environment_columns = WINDOWHEIGHT
# 400 pxls total in environment

# actions
actions = ['up', 'right', 'down', 'left']

UP = 'up'
RIGHT = 'right'
DOWN = 'down'
LEFT = 'left'

#Q-table
Q_table = np.zeros((environment_rows, environment_columns, len(actions)))     # environment_rows = 200 environment_collumns = 200 len(actions) = 4

# rewards = empty list []
# rewards around the game environment = -1
rewards = np.full((environment_rows, environment_columns), -2.)
# rewards when touching target = +1000
rewards[tx, ty] = 1000

RowandColumn = []

n_iterations = 5000
# LS stands for last stretch and it is our testing phase
LS = 100


#%% functions

# define an --epsilon-- greedy algorithm that will choose which action to take next (i.e., where to move next)
def action(current_row_index, current_column_index, epsilon):
    #if a randomly chosen value between 0 and 1 is less than epsilon, 
    #then choose the most promising value from the Q-table for this state.
    if np.random.random() < epsilon:        # Return random floats in the half-open interval [0.0, 1.0). Results are from the "continuous uniform" distribution over the stated interval.
        return np.argmax(Q_table[current_row_index, current_column_index])
    else: 
        return np.random.randint(4) # if the random number is a 0, return Up, if 1, return right, if 2... etc. See above action={., ., ., .}


def next_move(current_row_index, current_column_index, action_index):
    # defines different state if the AI hits the walls
    new_row_index = current_row_index
    new_column_index = current_column_index
    if actions[action_index] ==  UP and current_row_index > 0:
        new_row_index -= BLOCKSIZE

    elif actions[action_index] == RIGHT and current_column_index < environment_columns - BLOCKSIZE:
        new_column_index += BLOCKSIZE

    elif actions[action_index] == DOWN and current_row_index < environment_rows - BLOCKSIZE:
        new_row_index += BLOCKSIZE

    elif actions[action_index] == LEFT and current_column_index > 0:
        new_column_index -= BLOCKSIZE

    # Attempt to prevent agent from going off grid
    elif current_row_index >= environment_rows - BLOCKSIZE or current_column_index >= environment_columns - BLOCKSIZE:
        print('WALL danger')
        current_row_index -= 3*BLOCKSIZE
        current_column_index -= 3*BLOCKSIZE
        
    
    elif current_row_index <= BLOCKSIZE or current_column_index <= BLOCKSIZE:
        print('WALL danger')
        current_row_index += 3*BLOCKSIZE
        current_column_index += 3*BLOCKSIZE
        

    return new_row_index, new_column_index

def get_starting_location():
    if episode == 0:    # This will only be the index for the start of the game
        #get a random row and column index
        current_row_index = startx 
        current_column_index = starty 
    else:
        # The idea is that the first number appended to this list will be the next row and column
        # the appending happens below under the train function
        current_row_index, current_column_index = RowandColumn[0], RowandColumn[1]
    return current_row_index, current_column_index

def reset():
    current_row_index = startx 
    current_column_index = starty 
    return current_row_index, current_column_index

#%% Agent

Point = namedtuple('Point', 'x, y') 

pygame.init()

class Agent:
    def __init__(self):
        global FPSCLOCK, DISPLAYSURF, BASICFONT

        
        FPSCLOCK = pygame.time.Clock()
        DISPLAYSURF = pygame.display.set_mode((WINDOWWIDTH, WINDOWHEIGHT))
        BASICFONT = pygame.font.Font('freesansbold.ttf', 18)

        DISPLAYSURF.fill(BGCOLOR)

        self.n_games = 0   
        
        #define training parameters
        self.epsilon = 0.9 #the percentage of time when we should take the best action (instead of a random action)
        self.discount_factor = 0.9 #discount factor for future rewards
        self.learning_rate = 0.9 #the rate at which the agent should learn
    

def train():
    agent = Agent()
    game = blocks()
    
    #get the starting location for this episode
    row_index, column_index = get_starting_location()

    # Get the block to reset if it hit the target
    if (row_index, column_index) == (tx, ty):
        print(f'\n hit target at {row_index, column_index}')
        row_index, column_index = reset()
    # Testing AI after training
    # elif n_iterations - LS == episode:
    #     print(f'\n starting from scratch \n')
    #     row_index, column_index = reset()
        
    
    #choose which action to take (i.e., where to move next)
    action_index = action(row_index, column_index, agent.epsilon)

    #perform the chosen action, and transition to the next state (i.e., move to the next location)
    old_row_index, old_column_index = row_index, column_index #store the old row and column indexes
    row_index, column_index = next_move(row_index, column_index, action_index)

    # to insert the row index to the first number of the list, and the column index to the second number of the list
    # This list is meant to be used to choose the next coordinate for the movement of the block.
    RowandColumn.insert(0, row_index)
    RowandColumn.insert(1, column_index)
    
    #receive the reward for moving to the new state, and calculate the temporal difference
    reward = rewards[row_index, column_index]
    old_q_value = Q_table[old_row_index, old_column_index, action_index]
    temporal_difference = reward + (agent.discount_factor * np.max(Q_table[row_index, column_index])) - old_q_value

    #update the Q-value for the previous state and action pair
    new_q_value = old_q_value + (agent.learning_rate * temporal_difference)
    Q_table[old_row_index, old_column_index, action_index] = new_q_value

     # this is the index of the starting location/subsequent locations
    idx = Point(row_index, column_index)
    game.blocks.insert(0, idx)
    game.drawGrid()
    game.draw_Block()
    game.draw_target()
    pygame.display.flip()
    FPSCLOCK.tick(FPS)

if __name__ == '__main__':
    game = blocks()
    player = Agent()

    N_EPISODES = 0

    for episode in range(n_iterations):
        print(N_EPISODES)
        N_EPISODES += 1
        # if n_iterations - LS == episode:
        #     player.epsilon = 0
        #     print('\n Last episodes \n')
        train()


        game_over = game.play_step()

        if game_over == True:
           game.reset()

QLearning-MakingAGamePlayItself

Recent Posts

Comments