Software Clustering based on Information Loss Minimization

The majority of the algorithms in the software clustering literature utilize structural information in order to decompose large software systems. Other approaches, such as using £le names or ownership information, have also demonstrated merit. However, there is no intuitive way to combine information obtained from these two different types of techniques. In this paper, we present an approach that combines structural and non-structural information in an integrated fashion. LIMBO is a scalable hierarchical clustering algorithm based on the minimization of information loss when clustering a software system. We apply LIMBO to two large software systems in a number of experiments. The results indicate that this approach produces valid and useful clusterings of large software systems. LIMBO can also be used to evaluate the usefulness of various types of non-structural information to the software clustering process.

It is widely believed that an effective decomposition of a large software system into smaller, more manageable subsystems can be of signi£cant help to the process of understanding, redocumenting, or reverse engineering the system in question. As a result, the software clustering problem has attracted the attention of many researchers in the last two decades.

Click here for free

download this paper