COMPARISON OF CLUSTERING ALGORITHMS FOR THOROID DATABASE
The main idea of this work is to propose a methodology for analyzing, visualizing and clustering data of patients with different symptoms from a thyroid database. In previous work the thyroid data were analyzed using WITT algorithm. This clustering method properly formed the clusters of a control group and hypothyroid patients but failed to cluster the hyperthyroid patients. In this paper we analyzed the data using several algorithms: K-means, hierarchical clustering, EM algorithm, DBSCAN and Cobweb algorithm. The main idea is to determine the degree of matching between the clusters produced and the class labels in order to determine which algorithms give better results. Classification-oriented measures are used to validate the clustering results. We propose several preprocessing steps to overcome the problems with the large amount of noise and unbalanced classes in the given data set.