The term metaheuristic was firstly introduced by Glover (1977). It derives from the composition of two Greek words. Heuristic derives from the verb heuriskein which means “to find”, while the suffix meta means “beyond, in an upper level”. Before this term was widely adopted, metaheuristics were often called modern heuristics (Reeves, 1993). Examples of metaheuristics includebut not limited to Ant Colony Optimisation, Evolutionary Computation including Genetic Algorithm, Iterated Local Search, Simulated Annealing and Tabu Search. Nowadays metaheuristics are widely used to solve important practical combinatorial Optimisation problems. However, due to the variety of techniques and concepts comprised by metaheuristics, there is still no commonly agreed definition for metaheuristics.
A metaheuristic is formally defined as an iterative generation process which guides a subordinate heuristic by combining intelligently different concepts for exploring and exploiting the search space, learning strategies are used to structure information in order to find efficiently nearoptimal solutions (Osman and Laporte, 1996). Stutze (1999) defined metaheuristics as typically highlevel strategies which guide an underlying, more problem specific heuristic, to increase their performance. The main goal is to avoid the disadvantages of iterative improvement and, in particular, multiple descents by allowing the local search to escape from local optima. This is achieved by either allowing new starting solutions for the local search in a more ‘intelligent’ way than just providing random initial solutions. Many of the methods can be interpreted as introducing a bias such that high quality solutions are produced quickly. This bias can be of various forms and can be cast as descent bias (based on the objective function), memory bias (based on previously made decisions) or experience bias (based on prior performance). Many of the metaheuristic approaches rely on probabilistic decision made during the search. But, the main difference to pure random search is that in metaheuristic algorithm randomness is not used blindly but in an intelligent biased form.
The definition used in the Metaheuristics Network (2000) is as follows. A metaheuristic is a set of concepts that can be used to define heuristic methods that can be applied to a wide range of problems. In other words, a metaheuristic can be seen as a general algorithmic framework which can be applied to different Optimisation problems with relatively few modifications to make them adapted to a specific problem. Blum and Roli (2003) summarised the outline fundamental properties which characterize metaheuristics:

Metaheuristics are strategies that guide the search process.

The goal is to efficiently explore the search space in order to find (near) optimal solutions.

Techniques which constitute metaheuristics algorithms range from simple local search procedures to complex learning processes.

Metaheuristic algorithms are approximate and usually nondeterministic.

They may incorporate mechanism to avoid getting trapped in confined areas of the search space.

Metaheuristics are not problemspecific.

Todays more advanced metaheuristics use search experience (embodied in some form of memory) to guide the search.
According to Blum and Roli (2003), metaheuristics are highlevel strategies for exploring search spaces by using different methods. Of great importance hereby is that a dynamic balance is given between diversification and intensification. The term diversification generally refers to the exploration of the search space, whereas the term intensification refers to the exploitation of the accumulated search experience. These terms stem from the tabu search field and it is important to clarify that the terms exploration and exploitation are sometimes used instead, for example in the Evolutionary Computation field, with a more restricted meaning. In fact the notions of exploration and exploitation often refer to rather short term strategies tied to randomness, whereas intensification and diversification also refer to medium and long term strategies based on the usage of memory. The use of the terms diversification and intensification in heir initial meaning becomes more and more accepted by the whole field of metaheuristics.
According to Glover and Kochenberger (2003) in Raidl (2006), “… these methods have over time also come to include any procedure for problem solving that employs a strategy for overcoming the trap of local optimality in complex solution spaces, especially those procedures that utilize one or more neighbourhood structures as a means of defining admissible moves to transition from one solution to another, or to build or destroy solutions in constructive and destructive processes.”
Classification of Metaheuristics
There are several ways of classifying and describing metaheuristic algorithms. Blum and Roli (2003) summarised the classification of metaheuristics as follows.

Natureinspired vs. nonnature inspired.
Perhaps, the most intuitive way of classifying metaheuristics is based on the origins of the algorithm. There are natureinspired algorithms, like Genetic Algorithm (GA) and Ant Colony Algorithm (ACO), and non natureinspired ones such as Tabu Search (TS) and Iterated Local Search (ILS). In our opinion this classification is not very meaningful for the following two reasons. Firstly, many recent hybrid algorithms do not fit either class (or, in a sense, they fit both at the same time). Secondly, it is sometimes difficult to clearly attribute an algorithm to one of the two classes. So, for example, one might ask the question if the use of memory in Tabu Search is not natureinspired as well.

Populationbased vs. single point search.
Another characteristic that can be used for the classification of metaheuristics is the number of solutions used at the same time: Does the algorithm work on a population or on a single solution at any time? Algorithms that work on single solutions are known as trajectory methods and encompass local searchbased metaheuristics. They all share the property of describing a trajectory in the search space during the search process. Populationbased metaheuristics, on the contrary, perform search processes which describe the evolution of a set of points in the search space.

Dynamic vs. static objective function.
Metaheuristics can also be classified according to the way they make use of the objective function. While some algorithms keep the objective function given in the problem representation “as it is”, some others, like Guided Local Search (GLS), modify it during the search. The idea behind this approach is to escape from local minima by modifying the search landscape. Accordingly, during the search the objective function is altered by trying to incorporate information collected during the search process.

One vs. various neighborhood structures.
Most metaheuristic algorithms work on one single neighborhood structure. In other words, the fitness landscape topology does not change in the course of the algorithm. Other metaheuristics, such as Variable Neighborhood Search (VNS), use a set of neighborhood structures which gives the possibility to diversify the search by swapping between different fitness landscapes.

Memory usage vs. memoryless methods.
A very important feature of metaheuristics classification is the use of search history, that is, whether they use memory or not. Memoryless algorithms perform a Markov process, as the information used exclusively to determine the next action is the current state of the search process. There are several ways of making use of memory. Usually we differentiate between the use of short term and long term memory. The first usually keeps track of recently performed moves or visited solutions. The second is usually an accumulation of synthetic parameters about the search. The use of memory is nowadays recognized as one of the fundamental elements of a powerful metaheuristic.
In the rest of this study, we use the metaheuristics classification according to the single point versus populationbased search classification, which divides metaheuristics into trajectory methods and populationbased methods. This choice is motivated by the fact that this categorization permits a clearer description of the algorithms. Moreover, a current trend is the hybridisation of methods in the direction of the integration of single point search algorithms in populationbased ones.

Trajectory Methods

Basic Local Search: Iterative Improvement
The basic local search is usually called iterative improvement, since each move is only performed if the resulting solution is better than the current solution. The algorithm stops as soon as it finds a local minimum.

Simulated Annealing
Simulated Annealing (SA) is one of the oldest among the metaheuristics and surely among the first algorithms that has an explicit strategy to escape from local minima. The origins of the algorithm are in statistical mechanics (Metropolis algorithm) and it was first presented as a search algorithm for CO problem in Kirkpatrick et al. (1983) (given some assumptions on the cooling schedule of the temperature, etc.) that could be shown to converge to an optimal solution. The fundamental idea is to allow moves resulting in solutions of worse quality than the current solution (uphill moves) in order to escape from local minima. The probability of doing such a move is decreased during the search.
Simulated Annealing comes from the annealing process of solids. A solid is heated until it melts, and then the temperature of the solid is slowly decreased (according to annealing schedule) until the solid reaches the lowest energy state or the ground state. If the initial temperature is decreased rapidly, the solid at the ground state will have many defects or imperfections. An easy implementation of the algorithm makes it very easy to adapt a local search method (e.g. best improvement local search) to a simulated annealing algorithm, usually rendering the local search with much better results. But although it is proven to converge to the optimum, it converges in infinite time. Not only for this reason, but also since we have to cool down slowly, the algorithm is usually not faster than its contemporaries.

Tabu Search
Tabu Search (TS) was proposed by Glover in 1986. A description of the method and its concepts can be found in Glover and Laguna (1997). The basic principle of TS is to pursue a best improvement Local Search whenever it encounters a local minimum by allowing nonimproving moves, cycling back to previously visited solutions is prevented by the use of memories called tabu lists. These tabu lists record the recent history of the search. According to Glover, TS is not seen as a proper heuristic, but rather as a metaheuristic, i.e., a general strategy for guiding and controlling “inner” heuristics specifically tailored to the problems at hand (Gendreau, 2002). TS is among the most cited and used metaheuristics for combinatorial problems. Many computational experiments have shown that TS has now become an established Optimisation technique which can compete with almost all known techniques and which – by its flexibility – can beat many classical procedures.
The word tabu (or taboo) comes from Tongan, a language of Polynesia, where it was used by the aborigines of Tonga Island to indicate things that can not be touched because they are sacred. According to Webster’s Dictionary, the word tabu now also means “a prohibition imposed by social custom as a protective measure: or of something “banned as constituting a risk”. These current more pragmatic senses of the word accord well with the theme of tabu search. The risk to be avoided in this case is that of following a counterproductive course, including one which may lead to entrapment without hope of escape. On the other hand, as in the broader social context where “protective prohibitions” are capable of being superseded when the occasion demands, the “tabus” of tabu search are to be overruled when evidence of a preferred alternative becomes compelling.
The most important association with traditional usage, however, stems from the fact that tabus as normally conceived are transmitted by means of a social memory which is subject to modification over time. This creates the fundamental link to the meaning of “tabu” in TS. The forbidden elements of TS receive their status by reliance on an evolving memory, which allows this status to shift according to time and circumstance. More particularly, TS is based on the premise that problem solving, in order to qualify as intelligent, must incorporate adaptive memory and responsive exploration. The adaptive memory feature of TS allows the implementation of procedures that implement a form of sampling (Glover and Laguna, 1997).

Explorative Local Search Methods
These are the Greedy Randomised Adaptive Search Procedure (GRASP), Variable Neighbourhood Search (VNS), Guided Local Search (GLS) and Iterated Local Search (ILS).
2. Populationbased Methods
Populationbased methods deal in every iteration of the algorithm with a set (i.e., a population) of solutions rather than with a single solution. As they deal with a population of solutions, populationbased algorithms provide a natural, intrinsic way for the exploration of the search space. Yet the final performance depends strongly on the way the population is manipulated.

Evolutionary Computation
Evolutionary Computation (EC) algorithms are inspired by nature’s capability to evolve living beings well adapted to their environment. EC algorithms can be succinctly characterized as computational models of evolutionary processes. At each iteration a number of operators is applied to the individuals of the current population to generate the individuals of the population of the next generation (iteration). Usually, EC algorithms use operator called recombination or crossover to recombine two or more individuals to produce new individuals. They also use mutation or modification operators which cause a selfadaptation of individuals. The driving force in evolutionary algorithms is the selection of individuals based on their fitness (this can be the value of an objective function or the result of a simulation experiment, or some other kind of quality measure). Individual with a higher fitness have a higher probability to be chosen as members of the population of the next iteration (or as parents for the generation of new individuals). This corresponds to the principle of survival of the fittest in natural evolution. It is the capability of nature to adapt itself to a changing environment, which gave the inspiration for EC algorithms.
There has been a variety of slightly different EC algorithms proposed over the years. Basically they fall into three different categories which have been developed independently from each other. These are Evolutionary Programming (EP), Evolutionary Strategies and GA. EP arose from the desire to generate machine intelligence. While EP originally was proposed to operate on discrete representations of finite state machines, most of the present variants are used for continuous Optimisation problems. The latter also holds for most present variants of E, whereas GAs are mainly applied to solve combinatorial Optimisation problems.
GAs are an effective search and Optimisation method that simulates the process of natural selection or survival of the fittest. Holland in 1974 developed the idea and concepts behind the GA and many authors have refined his initial approach (Ortiz et al., 2004). The purpose of GAs is to provide satisfactory results for Optimisation problems that are hard to solve using exhaustive techniques. The researchers who used GAs to solve complex realworld problems report good results from these techniques.
Genetic Algorithms differ from traditional search and Optimisation algorithm in four ways:

GAs work with a coding of solution set, not the solution themselves.

GAs search from a population of solutions, not a single solution.

GAs use payoff information (fitness function), not derivatives or other auxiliary knowledge.

GAs use probabilistic transition rules, not deterministic rules. (Goldberg, 1989)
Whereas Jung (2000) notes that there are two distinguishing characteristics of the GAs that separate them from the general Optimisation techniques. The first characteristic is that the GAs start with an initial set of random feasible solutions, not a single solution. The GA generates many possible solutions to a given problem and then let them compete and mate to generate improved offspring. In conventional Optimisation techniques, a point to point approach is applied with a stepwise procedure to get the optimum solution, but this approach has the danger of falling in local optima (Gen and Cheng, 1997). However the GAs are population to population approach, and can escape from local optima and are very effective in global search with a multi directional search. The second characteristic is that the GAs do not have any specific functional requirement for the mathematical relationship that express a given problem. Therefore the GAs can handle any kind of objective function and constraints. It is difficult to apply simple GAs to complex Optimisation problem. However with some modification, GAs can solve some particular problems (Jung, 2000).
There are three major advantages when applying GA to Optimisation problems (Gen and Cheng, 1997), they are


Due to their evolutionary nature, GAs will search for solutions without regard to the specific inner workings of the problem. Gas can handle any kind of objective functions and any kind of constraints (linear and non linear) defined on discrete, continuous or mixed search spaces.

The ergodicity of evolution operators makes GA very effective at performing global search (in probability). The traditional approaches perform local search by a convergent stepwise procedure, which compares the value of nearby points and moves to the relative optimal points. Global optima can be found only if the problem possesses certain convexity properties that essentially guarantee that any local optimum is global optimum.

GA provides a great flexibility to hybridize with domain dependent heuristics to make an efficient implementation for a specific problem.
2.2. Ant Colony Optimisation
Ant Colony Optimisation (ACO) is a metaheuristic approach proposed in Dorigo (1992) and Dorigo et al. (1999). The inspiring source of ACO is the foraging behavior of real ants. This behavior enables ants to find shortest paths between food sources and their nest. While walking from food sources and the nest and vice versa, ants deposit a substance called pheromone on the ground. They decide which direction to go by choosing the higher probability paths that are marked by stronger pheromone concentrations. This basic behavior is the basis for a cooperative interaction which leads to the emergence of shortest paths.
Sumber : materi kuliah Data Mining Program S2 Statistika Komputasi ITS oleh Dr. Irhamah, MSi
About ACO
http://eswarann.wordpress.com/2012/02/20/antcolonyoptimization/
A simpler overview here at
http://eswarann.wordpress.com/2012/02/12/introductiontoalgorithmsheuristicsandmetaheuristics/