Project Reference: CS40

Student’s Name: Samuel Dugmore

Project title: Using class decomposition and Decision Trees to implement machine ensemble learning

Course Title: Computer Science

Supervisor’s Name: Shadi Basurra

To create an ensemble machine learning, Python-based classifier that utilises class decomposition and decision trees.

This project is presenting a new method of machine learning classification and will utilise class decomposition and ensemble learning. The algorithm is going to train each estimator in the ensemble with training data that has had each class decomposed into a random number of subclasses. Randomising the number of subclasses that each estimator sees is hoped to reduce the over-fitting of data because the number of subclasses has not been picked for the data specifically. Also, training the estimators with random numbers of subclasses will ensure they look at data differently and thus reducing the bias towards one class. To test this new method, experiments are carried out against XGBoost and Random Forest models where all three methods have equal number of estimators in the ensemble.

“The innovative aspects of the new algorithm is that it will use class decomposition in a new way, by randomising the number of sub classes generated from the training data that each tree will be trained on. This is to help create less class bias in datasets.
Another benefit is that by using majority voting in the ensemble, an overall judgement is made about the class and using a slightly weaker classifier, decision trees, to make this ensemble boosts their accuracy by using many of them together. ”