Granular computing

Generally speaking, information granules are collections of entities that usually originate at the numeric level and are arranged together due to their similarity, functional or physical adjacency, indistinguishability, coherency, or the like.

At present, granular computing is more a theoretical perspective than a coherent set of methods or principles.

In this sense, it encompasses all methods which provide flexibility and adaptability in the resolution at which knowledge or information is extracted and represented.

The same is generally true of all data: At different resolutions or granularities, different features and relationships emerge.

The aim of granular computing is to try to take advantage of this fact in designing more effective machine-learning and reasoning systems.

It is very common that in data mining or machine-learning applications the resolution of variables needs to be decreased in order to extract meaningful regularities.

An example of this would be a variable such as "outside temperature" (temp), which in a given application might be recorded to several decimal places of precision (depending on the sensing apparatus).

There are several interrelated reasons for granulating variables in this fashion: For example, a simple learner or pattern recognition system may seek to extract regularities satisfying a conditional probability threshold such as

Instead, the feature space must be preprocessed (often by an entropy analysis of some kind) so that some guidance can be given as to how the discretization process should proceed.

Moreover, one cannot generally achieve good results by naively analyzing and discretizing each variable independently, since this may obliterate the very interactions that we had hoped to discover.

A sample of papers that address the problem of variable discretization in general, and multiple-variable discretization in particular, is as follows: Chiu, Wong & Cheung (1991), Bay (2001), Liu et al. (2002), Wang & Liu (1998), Zighed, Rabaséda & Rakotomalala (1998), Catlett (1991), Dougherty, Kohavi & Sahami (1995), Monti & Cooper (1999), Fayyad & Irani (1993), Chiu, Cheung & Wong (1990), Nguyen & Nguyen (1998), Grzymala-Busse & Stefanowski (2001), Ting (1994), Ludl & Widmer (2000), Pfahringer (1995), An & Cercone (1999), Chiu & Cheung (1989), Chmielewski & Grzymala-Busse (1996), Lee & Shin (1994), Liu & Wellman (2002), Liu & Wellman (2004).

Variable granulation is a term that could describe a variety of techniques, most of which are aimed at reducing dimensionality, redundancy, and storage requirements.

Also in this category are more modern areas of study such as dimensionality reduction, projection pursuit, and independent component analysis.

These dimensionality reduction methods are all reviewed in the standard texts, such as Duda, Hart & Stork (2001), Witten & Frank (2005), and Hastie, Tibshirani & Friedman (2001).

The prototype may be the simple average of the data in the identified cluster, or some other representative measure.

Watanabe suggests that an observer might seek to thus partition a system in such a way as to minimize the interdependence between the parts "... as if they were looking for a natural division or a hidden crack."

In database systems, aggregations (see e.g. OLAP aggregation and Business intelligence systems) result in transforming original data tables (often called information systems) into the tables with different semantics of rows and columns, wherein the rows correspond to the groups (granules) of original tuples and the columns express aggregated information about original values within each of the groups.

Rough rows were automatically labeled with compact information about their values on data columns, often involving multi-column and multi-table relationships.

Database operations could be efficiently supported within such a new framework, with an access to the original data pieces still available (Slezak et al. 2013).

Just as in the case of value granulation (discretization/quantization), it is possible that relationships (dependencies) may emerge at one level of granularity that are not present at another.

As an example of this, we can consider the effect of concept granulation on the measure known as attribute dependency (a simpler relative of the mutual information).

The dependency ratio therefore expresses the proportion (within the entire universe) of such classifiable objects, in a sense capturing the "synchronization" of the two concept structures

In general it is not possible to test all sets of attributes to see which induced concept structures yield the strongest dependencies, and this search must be therefore be guided with some intelligence.

Another perspective on concept granulation may be obtained from work on parametric models of categories.

The choice of the number of these distributions, and their size, can again be viewed as a problem of concept granulation.

Finding the "right" concept resolution is a tricky problem for which many methods have been proposed (e.g., AIC, BIC, MDL, etc.

Granular computing can be conceived as a framework of theories, methodologies, techniques, and tools that make use of information granules in the process of problem solving.

In this sense, granular computing is used as an umbrella term to cover topics that have been studied in various fields in isolation.

By examining all of these existing studies in light of the unified framework of granular computing and extracting their commonalities, it may be possible to develop a general theory for problem solving.

Granular computing is thus essential in human problem solving and hence has a very significant impact on the design and implementation of intelligent systems.