Rough set

The following section contains an overview of the basic framework of rough set theory, as originally proposed by Zdzisław I. Pawlak, along with some of the key definitions.

More formal properties and boundaries of rough sets can be found in Pawlak (1991) and cited references.

cannot be expressed exactly, because the set may include and exclude objects which are indistinguishable on the basis of attributes

Clearly, when the upper and lower approximations are equal (i.e., boundary region empty), then

Rough set theory is one of many methods that can be employed to analyse uncertain (including vague) systems, although less common than more traditional methods of probability, statistics, entropy and Dempster–Shafer theory.

However a key difference, and a unique strength, of using classical rough set theory is that it provides an objective form of analysis.

[2] Unlike other methods, as those given above, classical rough set analysis requires no additional information, external parameters, models, functions, grades or subjective interpretations to determine set membership – instead it only uses the information presented within the given data.

In general, the upper and lower approximations are not equal; in such cases, we say that target set

is a reduct because eliminating any of these attributes causes a collapse of the equivalence-class structure, with the result that

Generally, it is these strong relationships that will warrant further investigation, and that will ultimately be of use in predictive modeling.

, the numerator above represents the total number of objects which – based on attribute set

The dependency ratio therefore expresses the proportion (within the entire universe) of such classifiable objects.

"can be interpreted as a proportion of such objects in the information system for which it suffices to know the values of attributes in

The relationship of this notion of attribute dependency to more traditional information-theoretic (i.e., entropic) notions of attribute dependence has been discussed in a number of sources, e.g. Pawlak, Wong, & Ziarko (1988),[4] Yao & Yao (2002),[5] Wong, Ziarko, & Ye (1986),[6] and Quafafou & Boussouf (2000).

The choice of such rules is not unique, and therein lies the issue of inductive bias.

[8] Let us say that we wish to find the minimal set of consistent rules (logical implications) that characterize our sample system.

The method for extracting such rules given in Ziarko & Shan (1995) is to form a decision matrix corresponding to each individual value

, states that all the following must be satisfied: It is clear that there is a large amount of redundancy here, and the next step is to simplify using traditional Boolean algebra.

In general, the procedure will be repeated for each possible value of the decision variable.

Two objects are conflicting when they are characterized by the same values of all attributes, but they belong to different concepts (classes).

be a nonempty lower or upper approximation of a concept represented by a decision-value pair

, For our sample information system, LEM2 will induce the following rules: Other rule-learning methods can be found, e.g., in Pawlak (1991),[1] Stefanowski (1998),[11] Bazan et al. (2004),[10] etc.

Additionally, the characteristic relation, (see, e.g., Grzymala-Busse & Grzymala-Busse (2007)) enables to process data sets with all three kind of missing attribute values at the same time: lost, "do not care" conditions, and attribute-concept values.

Rough set methods can be applied as a component of hybrid solutions in machine learning and data mining.

They have been found to be particularly useful for rule induction and feature selection (semantics-preserving dimensionality reduction).

The idea of rough set was proposed by Pawlak (1981) as a new mathematical tool to deal with vague concepts.

Comer, Grzymala-Busse, Iwinski, Nieminen, Novotny, Pawlak, Obtulowicz, and Pomykala have studied algebraic properties of rough sets.

Initial developments focused on the relationship - both similarities and difference - with fuzzy sets.

Pawlak (1995) considered that fuzzy and rough sets should be treated as being complementary to each other, addressing different aspects of uncertainty and vagueness.

Several generalizations of rough sets have been introduced, studied and applied to solving problems.