C4.5 algorithm

C4.5 is an algorithm used to generate a decision tree developed by Ross Quinlan.

The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier.

In 2011, authors of the Weka machine learning software described the C4.5 algorithm as "a landmark decision tree program that is probably the machine learning workhorse most widely used in practice to date".

[2] It became quite popular after ranking #1 in the Top 10 Algorithms in Data Mining pre-eminent paper published by Springer LNCS in 2008.

[3] C4.5 builds decision trees from a set of training data in the same way as ID3, using the concept of information entropy.

represent attribute values or features of the sample, as well as the class in which

At each node of the tree, C4.5 chooses the attribute of the data that most effectively splits its set of samples into subsets enriched in one class or the other.

The splitting criterion is the normalized information gain (difference in entropy).

The attribute with the highest normalized information gain is chosen to make the decision.

In pseudocode, the general algorithm for building decision trees is:[4] J48 is an open source Java implementation of the C4.5 algorithm in the Weka data mining tool.

Some of these are:[6][7] Source for a single-threaded Linux version of C5.0 is available under the GNU General Public License (GPL).