The Pacman Project
AI in Pac-Maas
7.1 Heuristics
In order to allow an implementation where computer agents could give advices to real time players, a virtual game module was needed. This module connects the web service and the simulation with the virtual AI players. Such a virtual game needs to be aware of its environment, since it is important to know how to play (e.g. what directions can the ghosts take?), whether pacman or the ghosts won, and how much time it took to end the game. For this reason, a nodemap has been implemented, representing a virtual map of the city of Maastricht. The virtual players can then move on this map by directions of the web service (cloned movement of the real players) or a simulated game (heuristic movement). These movement updates then allow to calculate an advice for the next direction to take (preferably at crossroads).
Important features for pacman are:
- The distance to the closest ghost player
- The number of nearby ghosts
- The number of nearby food pills
- The total number of food pills left
- How much he is surrounded by ghosts
- Whether he is invulnerable or not
Implementation of these features means that weights measuring the range which defines close food pills, power pills and ghosts need to be trained. Also, in order to avoid pacman to get stuck in semi-infinite loops, heuristics need to penalize game situations where pacman revisited locations too many times. This can be implemented by counting the pacman visits for each node and penalizing the heuristic value by looking at the node with the highest count. Another solution to this challenge could try to minimize the variance of these counts.
Similar to pacman, the ghosts consider the following features:
- Distance to the closest ghost player
- How much the last eaten pill is surrounded by ghosts
- The total number of food pills left
- The amount of ghosts on the same node
- How much the ghosts are connected to each other
The ratio which expresses how much a certain player or node is surrounded by ghosts can be calculated by translating the player's location to a specific node and computing the number of connected nodes occupied by ghosts.
7.2 Training the Evolutionary Algorithm
In order to train the several heuristic weights, an evolutionary algorithm was developed. This algorithm starts with a certain population of trainee players using default or random created heuristics. Each player then joins a certain number of simulation games and the amount of wins defines its fitness. One might take into consideration that a real game could end prematurely due to time constraints (draw). However, since a computer AI in training might progress slowly to an optimal solution, it is really important to keep the simulation game running until the end so small improvements can be detected in an early stage. Next, a roulette wheel decides what candidates are selected for selection, mutation or crossover and therefore form part of the next generation. This process is repeated a certain number of times, after which the chromosome with the best fitness is selected as optimal heuristics.
Mutation in this process means a new random value is generated for the selected mutation parameter, since all weights are theoretically unbounded. However, to increase the evolution speed, an upper and lower bound were defined for these genomes, so a too large variance in parameter weights might not overheat the search for optimal values.