The Pacman Project

AI approaches

6.1 Creating an AI for Pacman

Pac-Man is a real-time computer game that resembles a simplified version of many modern first-person environment computer games. That is, the game is centered around navigating the player character around a semi-structured world, accumulating points, avoiding and (when appropriate) attacking. This kind of game requires dynamic control of the game agent by the human player and involves task prioritization, planning, and risk assessment. While it is relatively easy for a human to learn a basic strategy for Pac-Man, the game has complex aspects that allow the possibility of developing more intelligent strategies (in conjunction with other skills such as hand-eye coordination). It must be noted that for the original version of Pac-Man it was possible to determine the behaviour of the ghosts, and thus achieve a perfect score. In later revisions of the game, the ghost behaviour was altered to avoid this, but new tactics were quickly discovered. However, for other versions of the game, the behaviour of the ghosts was no longer strictly deterministic. As it is a challenge for a person to describe precisely their Pacman-playing strategy, or to represent such a strategy formally (e.g. as a set of rules), it is also challenging to make an AI for this game. In the Pac-Maas project, we will design an AI for both Pacman as the ghosts.

Next, a description will be given of several AI approaches that have been used in previous works about Pac-man:

 

6.2 Rule-based AI

In a basic formulation, a rule is a sentence of the form "if [Condition] holds, then do [Action]". A rule-based policy is a set of rules with some mechanism for breaking ties, i.e., to decide which rule is executed if there are multiple rules with satisfied conditions. Rule-based policies are human-readable, it is easy to include domain knowledge, and they are able to represent complex behaviours. For these reasons, they are often used in many areas of artificial intelligence. In order to apply rule-based policies to Pac-Man, we need to specify four things:

  1. what are the possible actions
  2. what are the possible conditions and how are they constructed from observations
  3. how to make rules from conditions and actions
  4. how to combine the rules into policies

The mechanism of decision making is depicted in Figure 6.1. In short, the (hidden) state space is the world of the Pacman and the ghosts. The dynamics of this (hidden) state space determines the vector of observations, which can be checked by the conditions. If the conditions of a rule are satisfied, the corresponding action module is switched on or off. As a consequence, multiple actions may be in effect at once. For example, the decision depicted in Figure 6.1 sets two actions to work together.

 

Figure 6.1
Figure 6.1 - Mechanism for decision making in a Rule based AI

 

6.3 Evolutionary algorithms

Genetic programming is the most common evolutionary algorithm used, with reinforcement learning systems also quite common. Other examples include using genetic algorithms, especially on the weightings of rules, and in conjunction with neural networks. Genetic algorithms have also been used on problems similar to Pac-Man both individually, as well as with memory-based reasoning systems.
The solving of Pac-Man mazes using genetic programming has a fairly lengthy pedigree, but it tends to produce maze-specific solutions rather than a generic agent capable of playing the game itself. This can be seen primarily in Koza, who acknowledges that "the S-expression that resulted in this problem may or may not possess any generality" as the process generated a list of movements for Pac-Man to make. Rosca has also extensively used genetic programming with Pac-Man, though largely as examples facilitating other studies on genetic programming itself. Despite the lesser focus on the applicability of genetic programming as a technique to produce an agent capable of playing the game, Rosca reaches the conclusion that "although the Pac-Man is a typical reinforcement learning task it was successfully approached using GP". However, again this process formed a specific solution to a specific maze rather than the production of a game-playing agent.
Sheppard and Salzberg used genetic algorithms, memory-based reasoning and a combination of both (with the genetic algorithm acting as a boot-strapping method for the memory-based reasoning) in getting the 'evader' to avoid up to two pursuers. This is similar to Ryan where the agent was merely trying to avoid the ghost, albeit in the restricted area of a maze. Sheppard and Salzbergs implementation of these systems proved successful, with the agent eluding a single pursuer about 90% of the time after 5,000 games in the slowest learning situation; a straight genetic algorithm implementation. The genetic algorithm however did achieved a higher success rate over time. The scores for the system using memory-based reasoning dropped a lot with multiple pursuers, but only resulted in slower learning for the genetic algorithm.

 

6.4 Neural networks

In a recent research about evolve neural networks in Pac-Man project, the neural networks were implemented in object-oriented style with the basic class being a Layer, consisting of a weight matrix mapping the inputs (plus bias unit) to the output units, after which a non-linearity is applied to the net activation of each output. A single layer perception then consisted of just one such layer, a multi-layer perception consisted of two or more such layers, though experiments were limited to a maximum of two layers (hence one hidden layer in standard terminology).
For initial members of the population, all weights were drawn from a Gaussian distribution with zero mean and standard deviation equal to the reciprocal of the square root of the fan-in of the corresponding node. A mutated copy of a network was made by first creating an identical copy of the network and then perturbing the weights with noise from the same distribution. Weights to be mutated were selected in one of four ways as follows (probabilities in parentheses):

  1. all the weights in the network (0:1)
  2. all the weights in a randomly selected layer (0:27)
  3. all the weights going into a randomly selected node (0:27)
  4. a single randomly selected weight (0:36)

While these parameters could have been adapted (or indeed, the noise distribution for each weight could have been adapted), these adaptations can also have detrimental effects on the convergence of the algorithm when high levels of noise are present.
Evolving strategies against deterministic ghosts is easier than when the ghosts exhibit a small amount of nondeterminism. Effective ghosts chase aggressively most of the time, while occasionally making a surprise move in order to disrupt any patterns. In the non-deterministic game, the aim is to learn general behaviours, but the noise makes progress difficult. Increasing the number of games per fitness evaluation improves the reliability of the fitness estimate, but means that fewer fitness evaluations can be made within the same time.

 

<<< Previous | Index | Next >>>