2048 Python game and AI
2048 is a great game, and it’s pretty easy to write a desktop clone. Several AI algorithms also exist to play the game automatically, and I recently wondered how difficult it would be to develop something similar.
Full code and where we’re going
By the end of this blog you’ll have developed an algorithm to play a desktop clone of the game programatically.
>>> 0 2 0 0
8 8 4 0
1024 2 4 2
2048 4 16 4
For reference, the final project can be found here.
Representing the board
Let’s start with representation. My functions for advancing the state of the game are in the Game
class and I represent the current configuration as a list of lists.
class Game:
def __init__(self):
""" Initialize a 2048 game """
b = [[0]*4 for i in range(4)]
self.b = Game.spawn(b, 2)
def spawn(b, k=1):
"""
Add k random tiles to the board.
Chance of 2 is 90%; chance of 4 is 10%
"""
rows, cols = list(range(4)), list(range(4))
random.shuffle(rows)
random.shuffle(cols)
copy = [[x for x in row] for row in b]
dist = [2]*9 + [4]
count = 0
for i,j in itertools.product(rows, rows):
if copy[i][j] != 0: continue
copy[i][j] = random.sample(dist, 1)[0]
count += 1
if count == k : return copy
raise Exception("shouldn't get here")
You’ll notice that the spawn
function is immutable; we don’t change the initial b
variable defined in the constructor. Stateless design is appealing given that our AI would otherwise be constantly mutating b
to consider the result of making a particular move (more on that later). The Game
class isn’t really OOP as it requires that any client manage the board through functions like spawn
that return new boards.
We can check whether or not the game is over in the “left-right” direction by looping over the cells of the board, and checking that no cells have neighbors with the same cell value, and further that all cells are nonempty.
def over(b):
""" Returns true if the board is playable """
def inner(b):
for row in b:
for x, y in zip(row[:-1], row[1:]):
if x == y or x == 0 or y == 0:
return True
return False
return not inner(b) and not inner(zip(*b)) # "*" is called the "splat" operator
We also need to check the “up-down” direction. We can actually reuse the existing inner
function by making the “columns” of the board “rows” in a new board. This new board is called the transpose, and can be computed concisely by combining the zip
function with Python’s *
operator.
Also referred to as the “splat” operator, *
unpacks the board and extracts the four rows as lists. Since zip
can take more than two iterables, the result of zip
ping the unpacked rows is a collection of all the first elements in each row, followed by a collection of all the second elements in each row, etc. It’s easy to see that the result is the transpose.
The merge algorithm
Now for the merge. Let’s start by writing a merge function that takes a board and shifts/merges everything right to left. Rather than do this in one fell swoop, I wrote a recursive inner function to do this row by row, and build up the return board sequentially.
Why recursive? Rather than iteration, I think recursion provides a cleaner solution. If you think about it, we really only need to look at the first two elements in a row at a time. If they match, merge the result, advance past both of them, and recursively examine the rest of the row. If they don’t, only skip the first element in the list, since the third element might match with the second element, and perform the same recursive process.
Merging the entire board left is easy. For each row, we strip out the zeros and do a left merge on that row. We pad the merged row with the appropriate number of zeros, and append the merged row to our result board.
def merge(b):
""" Returns a left merged board """
def inner(row, a):
"""
Helper for merge. If we're finished with the list,
nothing to do; return the accumulator. Otherwise
if we have more than one element, combine results of first
with right if they match; skip over right and continue merge
"""
if not row:
return a
x = row[0]
if len(row) == 1:
return inner(row[1:], a + [x])
return inner(row[2:], a + [2*x]) if x == row[1] else inner(row[1:], a + [x])
ret = []
for row in b:
merged = inner([x for x in row if x != 0], [])
merged = merged + [0]*(len(row)-len(merged))
ret.append(merged)
return ret
With merge left taken care of, let’s take a look at merge right. What we want to do is express a right merge in terms of what we’ve already done. This is easier than it sounds – all we have to do is reverse each row, merge the reversed row, and take the reverse again to get back to the original representation.
def right(b):
""" Returns a right merged board
>>> Game.right(test)
[[0, 0, 2, 8], [0, 2, 4, 8], [0, 0, 0, 4], [0, 0, 4, 4]]
"""
def reverse(x):
return list(reversed(x)) # want the actual list, not a generator object
t = map(reverse, iter(b))
return [reverse(x) for x in Game.merge(t)]
Merging up and down can now be expressed by again computing the transpose of the board, and calling left
or right
on the transposed board.
def up(b):
""" Returns an upward merged board
NOTE: zip(*t) is transpose
>>> Game.up(test)
[[4, 8, 4, 8], [4, 2, 0, 2], [0, 0, 0, 4], [0, 0, 0, 0]]
"""
t = Game.left(zip(*b))
return [list(x) for x in zip(*t)]
def down(b):
""" Returns an downward merged board
NOTE: zip(*t) is transpose
>>> Game.down(test)
[[0, 0, 0, 0], [0, 0, 0, 8], [4, 8, 0, 2], [4, 2, 4, 4]]
"""
t = Game.right(zip(*b))
return [list(x) for x in zip(*t)]
Developing an AI
Barring a pretty-print function, we’re essentially done with the Game
class. Time to work on the AI.
Adversarial search
Games like Tic-Tac-Toe can be played “perfectly” by an artificial “agent” because everything about the game is known to both players. To decide what move to play, the agent will typically play all possible moves, play all opponent response moves, and then score the resulting configurations with a position fitness function (also called a “heuristic” or “evaluation function”). This process can be repeated recursively to deeper levels, resulting in a tree structure where each level alternates between boards resulting from the agent’s moves and boards resulting from the adversary’s response moves.
In what’s known as minimax search, the “max player” attempts to maximize the evaluation function while the opponent (or “min player”) seeks to minimize this function. For the max player, the best position will be given by the move that leaves the opponent capable of doing the least damage (and hence why the algorithm is called minimax).1 This strategy works great for what are called zero sum games, or games where one player’s gains are the other’s losses (and hence the sum of the board’s evaluation values from both players’ perspectives is always zero).
As good as this algorithm is2, it’s actually not that great for 2048. This is because, in contrast to chess or Tic-Tac-Toe, knowledge of the game is not perfect; the agent doesn’t know the result of making a move as the subsequent tile spawns are random, and don’t necessarily target the worst open position. Rather, there are “agent nodes”, or levels in the game tree where the agent is required to move, and “chance nodes”, levels where the game is supposed to give us a new board configuration according to some probability distribution. The original source code specifies that the chance of a 2
is ninety-percent; chance of a 4
is ten-percent.
Rather than the raw fitness value, what we want to consider is the maximum “expected fitness value.” In probability and statistics, expected value is a kind of average value, the result you’d expect after conducting a random experiment many times. Consider the sum of rolling two six-sided dice. If you list out all the possible outcomes, there are more ways to get a sum of seven than any other sum. If we repeated the experiment an infinite number of times, seven is the outcome we’d expect to be observed most frequently. In general, expected value can be thought of as a weighted average. Given a set of values and a set of “observation probabilities”, the value one would expect is not the average of the set of values, but rather the average of each value weighted by its chance of being observed.
Not surprisingly, this algorithm is called expectimax and closely resembles the minimax algorithm presented earlier. As we said before, we will evaluate each candidate move by simulating the game forward until some termination condition is met (the game is over or there are too many configurations to consider and we cut off the search prematurely). The choice that maximizes our fitness function wins.
def search(b, d, move=False):
"""
Performs expectimax search on a given configuration to
specified depth (d).
Algorithm details:
- if the AI needs to move, make each child move,
recurse, return the maximum fitness value
- if it is not the AI's turn, form all
possible child spawns, and return their weighted average
as that node's evaluation
"""
if d == 0 or (move and Game.over(b)):
return fitness(b)
alpha = fitness(b)
if move:
for _, child in Game.actions(b):
alpha = max(alpha, search(child, d-1))
else:
alpha = 0
zeros = [(i,j) for i,j in itertools.product(range(4), range(4)) if b[i][j] == 0]
for i, j in zeros:
c1 = [[x for x in row] for row in b]
c2 = [[x for x in row] for row in b]
c1[i][j] = 2
c2[i][j] = 4
alpha += .9*search(c1, d-1, True)/len(zeros) + \
.1*search(c2, d-1, True)/len(zeros)
return alpha
# search and score the children of a given "root configuration"
scores = [(action, search(child, 4)) for action ,child in Game.actions(b)]
Evaluation function
We’ve discussed a general algorithm for determining what move to play. We simulate the game forward by playing every possible move on every possible board, and pick the move that gives the most favorable future configuration. All that’s left to do is figure out a scoring heuristic for our search to maximize.
A common human strategy is to keep the big tiles in a corner, and order smaller tiles in descending order. In this sense, the ideal board might look like this.
>>> 128 64 0 0
256 32 2 0
512 16 2 0
1024 8 4 0
In my evaluation function, I flatten the board to an array representing the ideal tile sequence. Each term in this sequence is then weighted by a term in a geometric series. This ensures that tiles with high values in desirable positions are rewarded. I return the sum of the weighted sequence, minus a penalty for boards that fail to keep the tile with the largest tile value in the lower left corner.
def fitness(b):
"""
Returns the heuristic value of b
Snake refers to the "snake line pattern" (http://tinyurl.com/l9bstk6)
Here we only evaluate one direction; we award more points if high valued tiles
occur along this path. We penalize the board for not having
the highest valued tile in the lower left corner
"""
if Game.over(b):
return -float("inf")
snake = []
for i, col in enumerate(zip(*b)):
snake.extend(reversed(col) if i % 2 == 0 else col)
m = max(snake)
return sum(x/10**n for n, x in enumerate(snake)) - \
math.pow((b[3][0] != m)*abs(b[3][0] - m), 2)
Final thoughts
My algorithm only searches to four levels deep, but has achieved the 4096 tile. If the fitness function seems like magic, that’s because it is. I tinkered and experimented with a couple different heuristics and settled on the one that gave me the best results. You can read more about AI algorithms and heuristics for 2048 here and here. The second post presents an interesting method for determining the fitness function more intelligently than what I have done.
-
IBM’s Deep Blue chess program used this strategy in a search to a depth of six to eight moves, sometimes even reaching a depth of twenty or more moves. ↩