Pattern recognition

Pattern recognition has its origins in statistics and engineering; some modern approaches to pattern recognition include the use of machine learning, due to the increased availability of big data and a new abundance of processing power.

When no labeled data are available, other algorithms can be used to discover previously unknown patterns.

KDD and data mining have a larger focus on unsupervised methods and stronger connection to business use.

In machine learning, pattern recognition is the assignment of a label to a given input value.

A common example of a pattern-matching algorithm is regular expression matching, which looks for patterns of a given sort in textual data and is included in the search capabilities of many text editors and word processors.

A modern definition of pattern recognition is: The field of pattern recognition is concerned with the automatic discovery of regularities in data through the use of computer algorithms and with the use of these regularities to take actions such as classifying the data into different categories.

[4]Pattern recognition is generally categorized according to the type of learning procedure used to generate the output value.

A learning procedure then generates a model that attempts to meet two sometimes conflicting objectives: Perform as well as possible on the training data, and generalize as well as possible to new data (usually, this means being as simple as possible, for some technical definition of "simple", in accordance with Occam's Razor, discussed below).

Sometimes different terms are used to describe the corresponding supervised and unsupervised learning procedures for the same type of output.

The unsupervised equivalent of classification is normally known as clustering, based on the common perception of the task as involving no training data to speak of, and of grouping the input data into clusters based on some inherent similarity measure (e.g. the distance between instances, considered as vectors in a multi-dimensional vector space), rather than assigning each input instance into one of a set of pre-defined classes.

The piece of input data for which an output value is generated is formally termed an instance.

Features typically are either categorical (also known as nominal, i.e., consisting of one of a set of unordered items, such as a gender of "male" or "female", or a blood type of "A", "B", "AB" or "O"), ordinal (consisting of one of a set of ordered items, e.g., "large", "medium" or "small"), integer-valued (e.g., a count of the number of occurrences of a particular word in an email) or real-valued (e.g., a measurement of blood pressure).

Many common pattern recognition algorithms are probabilistic in nature, in that they use statistical inference to find the best label for a given instance.

The Branch-and-Bound algorithm[7] does reduce this complexity but is intractable for medium to large values of the number of available features

Feature extraction algorithms attempt to reduce a large-dimensionality feature vector into a smaller-dimensionality vector that is easier to work with and encodes less redundancy, using mathematical techniques such as principal components analysis (PCA).

(a time-consuming process, which is typically the limiting factor in the amount of data of this sort that can be collected).

For example, in the case of classification, the simple zero-one loss function is often sufficient.

This corresponds simply to assigning a loss of 1 to any incorrect labeling and implies that the optimal classifier minimizes the error rate on independent test data (i.e. counting up the fraction of instances that the learned function

The goal of the learning procedure is then to minimize the error rate (maximize the correctness) on a "typical" test set.

using Bayes' rule, as follows: When the labels are continuously distributed (e.g., in regression analysis), the denominator involves integration rather than summation: The value of

, weighted according to the posterior probability: The first pattern classifier – the linear discriminant presented by Fisher – was developed in the frequentist tradition.

Note that the usage of 'Bayes rule' in a pattern classifier does not make the classification approach Bayesian.

Bayesian statistics has its origin in Greek philosophy where a distinction was already made between the 'a priori' and the 'a posteriori' knowledge.

Moreover, experience quantified as a priori parameter values can be weighted with empirical observations – using e.g., the Beta- (conjugate prior) and Dirichlet-distributions.

The Bayesian approach facilitates a seamless intermixing between expert knowledge in the form of subjective probabilities, and objective observations.

Within medical science, pattern recognition is the basis for computer-aided diagnosis (CAD) systems.

[citation needed] Pattern recognition has many real-world applications in image processing.

Some examples include: In psychology, pattern recognition is used to make sense of and identify objects, and is closely related to perception.

The template-matching hypothesis suggests that incoming stimuli are compared with templates in the long-term memory.

Feature detection models, such as the Pandemonium system for classifying letters (Selfridge, 1959), suggest that the stimuli are broken down into their component parts for identification.