DATA MINING APPROACHES FOR INTRUSION DETECTION-ISSUES AND RESEARCH DIRECTIONS
The goal of an intrusion detection system (IDS) is to identify authorized and unauthorized intruders by differentiating anomalous network activity from normal network traffic. Data mining methods have been used to build automatic intrusion detection systems. The central idea is to utilize auditing programs to extract a set of features that describe each network connection or host session, and apply data mining programs to learn rules that capture intrusive and non-intrusive behavior. The goal of this paper is to provide a survey of some works that employ data mining techniques for intrusion detection and to address some technical issues. A new idea is proposed in the paper that will view intrusion detection from a data warehouse perspective and integrate data mining and on-line analytical processing (OLAP) for intrusion detection purposes.
With the ever-increasing growth of computer networks and emergence of electronic commerce in recent years, computer security has become a priority. Since intrusions take advantage of vulnerabilities in computer systems and socially engineered penetration techniques, in addition to intrusion prevention techniques (such as user authentication), intrusion detection is often used as another wall of protection. This is not an easy task due to the vastness of the network activity data and the need to regularly update the IDS to cope with new, unknown attack methods or upgraded computing environments. Mukherjee, Heberlein, and Levitt  defined intrusion detection as identifying unauthorized use, misuse, and abuse of computer systems by both inside and outside intruders. There are many categories of network intrusions . Examples include SMTP (SendMail) attacks, password guessing, IP spoofing, buffer overflow attacks, multiscan attacks, denial of service (DoS) such as ping-of-death, SYN flood, etc. Intrusion detection can broadly be divided into two categories: misuse detection and anomaly detection . Misuse detection is based on the knowledge of system vulnerabilities and known attack patterns, while anomaly detection assumes that an intrusion will always reflect some deviation from normal patterns. Many AI techniques have been applied to both misuse detection and anomaly detection. Pattern matching systems like rule-based expert systems, state transition analysis, and genetic algorithms are direct and efficient ways to implement misuse detection. On the other hand, inductive sequential patterns, artificial neural networks, statistical analysis and data mining methods have been used in anomaly detection. Data mining can be defined as the process of discovering implicit, unknown and useful information from databases . Data mining methods can be applied to extensively gathered audit data to compute models that can capture the intrusive and non-intrusive behavior. Audit data consists of pre-processed time-stamped audit records, each with a number of features. They contain useful data from which a well-designed data mining system can discover beneficial information. For example, a typical audit log file can contain source IP address, destination IP address, type of service, status flag etc. for a connection. Data mining approaches provide automatic models that eliminate the need to manually analyze and encode intrusion patterns. The goal of this paper is to provide a review of previous work that has applied data mining in intrusion detection. Some performance issues will also be addressed. Since data mining techniques have to deal with large amount of audit data, the intrusion detection task can be viewed from a data warehouse perspective. A new idea is proposed in this paper that integrates data mining and on-line analytical processing (OLAP) for intrusion detection purposes. The remainder of the paper is organized as follows. Section 2 provides a review of related work. Section 3 discusses some technical issues that need to be addressed. Section 4 proposes an idea for intrusion detection that is based on data warehousing and data mining. Finally, the paper ends with concluding remarks in Section 5.