2048 Python game and AI

27 Sep 2015

2048 is a great game, and it’s pretty easy to write a desktop clone. Several AI algorithms also exist to play the game automatically, and I recently wondered how difficult it would be to develop something similar.

Full code and where we’re going

By the end of this blog you’ll have developed an algorithm to play a desktop clone of the game programatically.

>>>    0       2       0       0
       8       8       4       0
    1024       2       4       2
    2048       4      16       4

For reference, the final project can be found here.

Representing the board

Let’s start with representation. My functions for advancing the state of the game are in the Game class and I represent the current configuration as a list of lists.

class Game:
    def __init__(self):
        """ Initialize a 2048 game """
        b = [[0]*4 for i in range(4)]
        self.b = Game.spawn(b, 2)

    def spawn(b, k=1):
        """ 
        Add k random tiles to the board.
        Chance of 2 is 90%; chance of 4 is 10% 
        """
        rows, cols = list(range(4)), list(range(4))
        random.shuffle(rows)
        random.shuffle(cols)
        
        copy  = [[x for x in row] for row in b]
        dist  = [2]*9 + [4]
        count = 0
        for i,j in itertools.product(rows, rows):
            if copy[i][j] != 0: continue
            
            copy[i][j] = random.sample(dist, 1)[0]
            count += 1
            if count == k  : return copy
        raise Exception("shouldn't get here")

You’ll notice that the spawn function is immutable; we don’t change the initial b variable defined in the constructor. Stateless design is appealing given that our AI would otherwise be constantly mutating b to consider the result of making a particular move (more on that later). The Game class isn’t really OOP as it requires that any client manage the board through functions like spawn that return new boards.

We can check whether or not the game is over in the “left-right” direction by looping over the cells of the board, and checking that no cells have neighbors with the same cell value, and further that all cells are nonempty.

def over(b):
    """ Returns true if the board is playable """
    def inner(b):
        for row in b:
            for x, y in zip(row[:-1], row[1:]):
                if x == y or x == 0 or y == 0:
                    return True
        return False
    return not inner(b) and not inner(zip(*b)) # "*" is called the "splat" operator

We also need to check the “up-down” direction. We can actually reuse the existing inner function by making the “columns” of the board “rows” in a new board. This new board is called the transpose, and can be computed concisely by combining the zip function with Python’s * operator.

Also referred to as the “splat” operator, * unpacks the board and extracts the four rows as lists. Since zip can take more than two iterables, the result of zipping the unpacked rows is a collection of all the first elements in each row, followed by a collection of all the second elements in each row, etc. It’s easy to see that the result is the transpose.

The merge algorithm

Now for the merge. Let’s start by writing a merge function that takes a board and shifts/merges everything right to left. Rather than do this in one fell swoop, I wrote a recursive inner function to do this row by row, and build up the return board sequentially.

Why recursive? Rather than iteration, I think recursion provides a cleaner solution. If you think about it, we really only need to look at the first two elements in a row at a time. If they match, merge the result, advance past both of them, and recursively examine the rest of the row. If they don’t, only skip the first element in the list, since the third element might match with the second element, and perform the same recursive process.

Merging the entire board left is easy. For each row, we strip out the zeros and do a left merge on that row. We pad the merged row with the appropriate number of zeros, and append the merged row to our result board.

def merge(b):
    """ Returns a left merged board """
    
    def inner(row, a):
        """
        Helper for merge. If we're finished with the list,
        nothing to do; return the accumulator. Otherwise
        if we have more than one element, combine results of first
        with right if they match; skip over right and continue merge
        """
        
        if not row:
            return a
        x = row[0]
        if len(row) == 1:
            return inner(row[1:], a + [x])
        return inner(row[2:], a + [2*x]) if x == row[1] else inner(row[1:], a + [x])

    ret = []
    for row in b:
        merged = inner([x for x in row if x != 0], [])
        merged = merged + [0]*(len(row)-len(merged))
        ret.append(merged)
    return ret

With merge left taken care of, let’s take a look at merge right. What we want to do is express a right merge in terms of what we’ve already done. This is easier than it sounds – all we have to do is reverse each row, merge the reversed row, and take the reverse again to get back to the original representation.

def right(b):
    """ Returns a right merged board

    >>> Game.right(test)
    [[0, 0, 2, 8], [0, 2, 4, 8], [0, 0, 0, 4], [0, 0, 4, 4]]
    """

    def reverse(x):
        return list(reversed(x)) # want the actual list, not a generator object
    
    t = map(reverse, iter(b))
    return [reverse(x) for x in Game.merge(t)]

Merging up and down can now be expressed by again computing the transpose of the board, and calling left or right on the transposed board.

def up(b):
    """ Returns an upward merged board
        NOTE: zip(*t) is transpose
        
    >>> Game.up(test) 
    [[4, 8, 4, 8], [4, 2, 0, 2], [0, 0, 0, 4], [0, 0, 0, 0]]
    """

    t = Game.left(zip(*b))
    return [list(x) for x in zip(*t)]

def down(b):
    """ Returns an downward merged board
        NOTE: zip(*t) is transpose

    >>> Game.down(test)
    [[0, 0, 0, 0], [0, 0, 0, 8], [4, 8, 0, 2], [4, 2, 4, 4]]
    """
    
    t = Game.right(zip(*b))
    return [list(x) for x in zip(*t)]

Developing an AI

Barring a pretty-print function, we’re essentially done with the Game class. Time to work on the AI.

Adversarial search

Games like Tic-Tac-Toe can be played “perfectly” by an artificial “agent” because everything about the game is known to both players. To decide what move to play, the agent will typically play all possible moves, play all opponent response moves, and then score the resulting configurations with a position fitness function (also called a “heuristic” or “evaluation function”). This process can be repeated recursively to deeper levels, resulting in a tree structure where each level alternates between boards resulting from the agent’s moves and boards resulting from the adversary’s response moves.

In what’s known as minimax search, the “max player” attempts to maximize the evaluation function while the opponent (or “min player”) seeks to minimize this function. For the max player, the best position will be given by the move that leaves the opponent capable of doing the least damage (and hence why the algorithm is called minimax).¹ This strategy works great for what are called zero sum games, or games where one player’s gains are the other’s losses (and hence the sum of the board’s evaluation values from both players’ perspectives is always zero).

As good as this algorithm is², it’s actually not that great for 2048. This is because, in contrast to chess or Tic-Tac-Toe, knowledge of the game is not perfect; the agent doesn’t know the result of making a move as the subsequent tile spawns are random, and don’t necessarily target the worst open position. Rather, there are “agent nodes”, or levels in the game tree where the agent is required to move, and “chance nodes”, levels where the game is supposed to give us a new board configuration according to some probability distribution. The original source code specifies that the chance of a 2 is ninety-percent; chance of a 4 is ten-percent.

Rather than the raw fitness value, what we want to consider is the maximum “expected fitness value.” In probability and statistics, expected value is a kind of average value, the result you’d expect after conducting a random experiment many times. Consider the sum of rolling two six-sided dice. If you list out all the possible outcomes, there are more ways to get a sum of seven than any other sum. If we repeated the experiment an infinite number of times, seven is the outcome we’d expect to be observed most frequently. In general, expected value can be thought of as a weighted average. Given a set of values and a set of “observation probabilities”, the value one would expect is not the average of the set of values, but rather the average of each value weighted by its chance of being observed.

Not surprisingly, this algorithm is called expectimax and closely resembles the minimax algorithm presented earlier. As we said before, we will evaluate each candidate move by simulating the game forward until some termination condition is met (the game is over or there are too many configurations to consider and we cut off the search prematurely). The choice that maximizes our fitness function wins.

def search(b, d, move=False):
    """
    Performs expectimax search on a given configuration to
    specified depth (d).

    Algorithm details:
       - if the AI needs to move, make each child move,
         recurse, return the maximum fitness value
       - if it is not the AI's turn, form all
         possible child spawns, and return their weighted average 
         as that node's evaluation
    """
    if d == 0 or (move and Game.over(b)):
        return fitness(b)

    alpha = fitness(b)
    if move:
        for _, child in Game.actions(b):
            alpha = max(alpha, search(child, d-1))
    else:
        alpha = 0
        zeros = [(i,j) for i,j in itertools.product(range(4), range(4)) if b[i][j] == 0]
        for i, j in zeros:
            c1 = [[x for x in row] for row in b]
            c2 = [[x for x in row] for row in b]
            c1[i][j] = 2
            c2[i][j] = 4
            alpha += .9*search(c1, d-1, True)/len(zeros) + \
                     .1*search(c2, d-1, True)/len(zeros)
    return alpha

# search and score the children of a given "root configuration" 
scores = [(action, search(child, 4)) for action ,child in Game.actions(b)]

Evaluation function

We’ve discussed a general algorithm for determining what move to play. We simulate the game forward by playing every possible move on every possible board, and pick the move that gives the most favorable future configuration. All that’s left to do is figure out a scoring heuristic for our search to maximize.

A common human strategy is to keep the big tiles in a corner, and order smaller tiles in descending order. In this sense, the ideal board might look like this.

>>>  128       64       0       0
     256       32       2       0
     512       16       2       0
    1024        8       4       0

In my evaluation function, I flatten the board to an array representing the ideal tile sequence. Each term in this sequence is then weighted by a term in a geometric series. This ensures that tiles with high values in desirable positions are rewarded. I return the sum of the weighted sequence, minus a penalty for boards that fail to keep the tile with the largest tile value in the lower left corner.

def fitness(b):
    """
    Returns the heuristic value of b
    Snake refers to the "snake line pattern" (http://tinyurl.com/l9bstk6)
    Here we only evaluate one direction; we award more points if high valued tiles
    occur along this path. We penalize the board for not having
    the highest valued tile in the lower left corner
    """
    if Game.over(b):
        return -float("inf")
    
    snake = []
    for i, col in enumerate(zip(*b)):
        snake.extend(reversed(col) if i % 2 == 0 else col)

    m = max(snake)
    return sum(x/10**n for n, x in enumerate(snake)) - \
           math.pow((b[3][0] != m)*abs(b[3][0] - m), 2)

Final thoughts

My algorithm only searches to four levels deep, but has achieved the 4096 tile. If the fitness function seems like magic, that’s because it is. I tinkered and experimented with a couple different heuristics and settled on the one that gave me the best results. You can read more about AI algorithms and heuristics for 2048 here and here. The second post presents an interesting method for determining the fitness function more intelligently than what I have done.

You can read more about the algorithm here ↩
IBM’s Deep Blue chess program used this strategy in a search to a depth of six to eight moves, sometimes even reaching a depth of twenty or more moves. ↩