Learning classifier system

[2] Learning classifier systems seek to identify a set of context-dependent rules that collectively store and apply knowledge in a piecewise manner in order to make predictions (e.g. behavior modeling,[3] classification,[4][5] data mining,[5][6][7] regression,[8] function approximation,[9] or game strategy).

This approach allows complex solution spaces to be broken up into smaller, simpler parts for the reinforcement learning that is inside artificial intelligence research.

As a result, the LCS paradigm can be flexibly applied to many problem domains that call for machine learning.

Pittsburgh-style systems perform batch learning, where rule sets are evaluated in each iteration over much or all of the training data.

In Michigan-style LCS, the entire trained (and optionally, compacted) classifier population forms the prediction model.

At this point in the learning cycle, if no classifiers made it into either [M] or [C] (as would be the case when the population starts off empty), the covering mechanism is applied (fifth step).

Subsumption is an explicit generalization mechanism that merges classifiers that cover redundant parts of the problem space.

In the eighth step, LCS adopts a highly elitist genetic algorithm (GA) which will select two parent classifiers based on fitness (survival of the fittest).

The LCS genetic algorithm is highly elitist since each learning iteration, the vast majority of the population is preserved.

Rule discovery may alternatively be performed by some other method, such as an estimation of distribution algorithm, but a GA is by far the most common approach.

In a simple voting scheme, the action with the strongest supporting 'votes' from matching rules wins, and becomes the selected prediction.

[21] Other important concepts that emerged in the early days of LCS research included (1) the formalization of a bucket brigade algorithm (BBA) for credit assignment/learning,[22] (2) selection of parent rules from a common 'environmental niche' (i.e. the match set [M]) rather than from the whole population [P],[23] (3) covering, first introduced as a create operator,[24] (4) the formalization of an action set [A],[24] (5) a simplified algorithm architecture,[24] (6) strength-based fitness,[21] (7) consideration of single-step, or supervised learning problems[25] and the introduction of the correct set [C],[26] (8) accuracy-based fitness[27] (9) the combination of fuzzy logic with LCS[28] (which later spawned a lineage of fuzzy LCS algorithms), (10) encouraging long action chains and default hierarchies for improving performance on multi-step problems,[29][30][31] (11) examining latent learning (which later inspired a new branch of anticipatory classifier systems (ACS)[32]), and (12) the introduction of the first Q-learning-like credit assignment technique.

[11][35] Wilson's Zeroth-level Classifier System (ZCS)[35] focused on increasing algorithmic understandability based on Hollands standard LCS implementation.

[21] This was done, in part, by removing rule-bidding and the internal message list, essential to the original BBA credit assignment, and replacing it with a hybrid BBA/Q-Learning strategy.

[11] XCS took the simplified architecture of ZCS and added an accuracy-based fitness, a niche GA (acting in the action set [A]), an explicit generalization mechanism called subsumption, and an adaptation of the Q-Learning credit assignment.

Following the success of XCS, LCS were later described as reinforcement learning systems endowed with a generalization capability.

Similarly, the design of XCS drives it to form an all-inclusive and accurate representation of the problem space (i.e. a complete map) rather than focusing on high payoff niches in the environment (as was the case with strength-based LCS).

These early works inspired later interest in applying LCS algorithms to complex and large-scale data mining tasks epitomized by bioinformatics applications.

In 1998, Stolzmann introduced anticipatory classifier systems (ACS) which included rules in the form of 'condition-action-effect, rather than the classic 'condition-action' representation.

In other words, the system evolves a model that specifies not only what to do in a given situation, but also provides information of what will happen after a specific action will be executed.

This family of LCS algorithms is best suited to multi-step problems, planning, speeding up learning, or disambiguating perceptual aliasing (i.e. where the same observation is obtained in distinct states but requires different actions).

Bacardit introduced GAssist[47] and BioHEL,[48] Pittsburgh-style LCSs designed for data mining and scalability to large datasets in bioinformatics applications.

In 2008, Drugowitsch published the book titled "Design and Analysis of Learning Classifier Systems" including some theoretical examination of LCS algorithms.

[49] Butz introduced the first rule online learning visualization within a GUI for XCSF[1] (see the image at the top of this page).

Urbanowicz extended the UCS framework and introduced ExSTraCS, explicitly designed for supervised learning in noisy problem domains (e.g. epidemiology and bioinformatics).

[50] ExSTraCS integrated (1) expert knowledge to drive covering and genetic algorithm towards important features in the data,[51] (2) a form of long-term memory referred to as attribute tracking,[52] allowing for more efficient learning and the characterization of heterogeneous data patterns, and (3) a flexible rule representation similar to Bacardit's mixed discrete-continuous attribute list representation.

[53] Both Bacardit and Urbanowicz explored statistical and visualization strategies to interpret LCS rules and perform knowledge discovery for data mining.

[54] ExSTraCS 2.0 was later introduced to improve Michigan-style LCS scalability, successfully solving the 135-bit multiplexer benchmark problem for the first time directly.

[5] The n-bit multiplexer problem is highly epistatic and heterogeneous, making it a very challenging machine learning task.

Michigan-style systems have the advantage of being applicable to a greater number of problem domains, and the unique benefits of incremental learning.