Clustering with a Genetically Optimized Approach

This paper describes a genetically guided approach to optimizing the hard (J1) and fuzzy (Jm) c-means functionals used in cluster analysis. Our experiments show that a genetic algorithm ameliorates the diculty of choosing an initialization for the c-means clustering algorithms. Experiments use six data sets, including the Iris data, magnetic resonance and color images. The genetic algorithm approach is generally able to nd the lowest known Jm value or a Jm associated with a partition very similar to that associated with the lowest Jm value. On data sets with several local extrema, the GA approach always avoids the less desirable solutions. Degenerate partitions are always avoided by the GA approach, which provides an effective method for optimizing clustering models whose objective function can b e represented in terms of cluster centers. The time cost of genetic guided clustering is shown to make a series random initializations of fuzzy/hard c-means, where the partition associated with the lowest Jm value is chosen, an e ective competitor for many clustering domains.

Unsupervised learning is useful in exploratory data analysis, image segmentation and, with some added class knowledge, may be used for classi cation as well. Here we present a genetical ly guided algorithm (GGA) approach to optimization of certain clustering models. This approach can be directly applied to any clustering model which can be represented as a functional dependent upon a set of cluster centers (or point prototypes). The approach can b e further generalized for models that require parameters other than the cluster centers. In this paper the fuzzy and hard c-means (FCM/HCM respectively) functionals, Jm and J1, are used as tness functions . This allows us to compare performance of the GGA with the conventional FCM/HCM algorithms and examine GGA optimization performance with similar but di erent objective functions. It allows comparison with other GA work on clustering. Clustering algorithms such as FCM which use calculus-based optimization methods can b e trapped by local extrema in the process of optimizing the clustering criterion. They are also very sensitive to initialization. Other conventional optimization methods, such as the Nelder-Mead Simplex method can be used to optimize Jm. Hathaway and Bezdek o er that this is probably a good option for only 10 or fewer unknowns and a few hundred data points . Faster techniques based on Newton’s method and its variants, such as those described in , can be applied to form partitions with the use of appropriately di erentiable distance metrics. However, these methods have signi cant sensitivity to the chosen initialization.

Free download research paper