Please use this identifier to cite or link to this item:
http://dspace.cityu.edu.hk/handle/2031/619
Title: | Data conversion from numerical to nominal data for classification and clustering |
Authors: | Chow, Man Chung |
Department: | Department of Electronic Engineering |
Issue Date: | 2005 |
Supervisor: | Prof. Chow, Tommy W S. Assessor: Dr. Tang, K S |
Abstract: | Real world classification tasks involve different types of attributes such as categorical, nominal and numerical data. Classifiers can handle categorical and nominal values, but not all classifiers can handle numerical data. If a classifier can handle numerical data, it will perform a discretization before running any classification tasks, decision trees is one of the representatives. Decision trees play an important role in classification tasks and behave in an efficient manner. In the project, I have implemented three different heuristic discretization methods which aim to increase the classification accuracy of decision trees. I have empirically evaluated more than twenty datasets. All the experiments were conducted under the same computational environment. “Weka”, a popular and efficient machine learning tools, was used as a benchmark to measure the classification accuracy of different algorithms including the non-discretized numerical datasets. The obtained results show that the classification results can be retained or improved in terms of accuracy after discretization being applied. In addition, it was found that the proposed algorithms not only enhance the efficiency of decision trees classifiers, it also increases the clustering accuracy. This corroborates my argument that with the aid of an appropriate discretization method, classification accuracy can be increased either in a supervised or unsupervised classification. In order to demonstrate the benefits of the proposed algorithms, a “MP3 players” survey, designed to identify and study certain interesting data such as customer behaviors, has been conducted. The classification result of the survey data indicate that an improved accuracy was achieved after the application of the developed discretization method. Thus, it is believed that the proposed methods are applicable to many real life problems. |
Appears in Collections: | Electrical Engineering - Undergraduate Final Year Projects |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.html | 164 B | HTML | View/Open |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.