Machine Learning Workbench for Data Mining
The Weka workbench is an organized collection of state-of-the-art machine learning algorithms and data preprocessing tools. The basic way of interacting with these methods is by invoking them from the command line. However, convenient interactive graphical user interfaces are provided for data exploration, for setting up large-scale experiments on distributed computing platforms, and for designing conﬁgurations for streamed data processing. These interfaces constitute an advanced environment for experimental data mining. The system is written in Java and distributed under the terms of the GNU General Public License. Keywords: machine learning software, data mining, data preprocessing, data visualization, extensible workbench Experience shows that no single machine learning method is appropriate for all possible learning problems. The universal learner is an idealistic fantasy. Real datasets vary, and to obtain accurate models the bias of the learning algorithm must match the structure of the domain. The Weka workbench is a collection of state-of-the-art machine learning algorithms and data preprocessing tools. It is designed so that users can quickly try out existing machine learning methods on new datasets in very ﬂexible ways. It provides extensive support for the whole process of experimental data mining, including preparing the input data, evaluating learning schemes statistically, and visualizing both the input data and the result of learning. This has been accomplished by including a wide variety of algorithms for learning diﬀerent types of concepts, as well as a wide range of preprocessing methods. This diverse and comprehensive set of tools can be invoked through a common interface, making it possible for users to compare diﬀerent methods and identify those that are most appropriate for the problem at hand.
The workbench includes methods for all the standard data mining problems: regression, classiﬁcation, clustering, association rule mining, and attribute selection. Getting to know the data is is a very important part of data mining, and many data visualization facilities and data preprocessing tools are provided. All algorithms and methods take their input in the form of a single relational table, which can be read from a ﬁle or generated by a database query
FREE IEEE PAPER