The conventional approach to securing computer systems against cyber threats is to design mechanisms such as ¯firewalls, authentication tools, and virtual private networks that create a protective shield. However, these mechanisms almost always have vulnerabilities. They cannot ward of attacks that are continually being adapted to exploit system weaknesses, which are often caused by careless design and implementation flaws. This has created the need for intrusion detection, security technology that complements conventional security approaches by monitoring systems and identifying computer attacks.
Traditional intrusion detection methods are based on human experts' extensive knowledge of attack signatures which are character strings in a messages payload that indicate malicious content. Signatures have several limitations. They cannot detect novel attacks, because someone must manually revise the signature database beforehand for each new type of intrusion discovered. Once someone discovers a new attack and develops its signature, deploying that signature is often delayed. These limitations have led to an increasing interest in intrusion detection techniques based on data mining. After aggregating and consolidating massive amounts of network traffic data from its entire global data network analytics can be to provide both situational awareness and strategic intelligence about how people are trying to access the network.
Analytics differentiates between normal, viable requests and those that don’t fit normal usage patterns and could be some form of attack against the network. For example, anomaly detection techniques could be used to detect unusual patterns and behaviors. Link analysis may be used to trace self-propagating malicious code to its authors. Classification may be used to group various cyber-attacks and then use the profiles to detect an attack when it occurs. Prediction may be used to determine potential future attacks depending in a way on information learnt about terrorists through email and phone conversations. Also, for some threats non real-time data mining may suffice while for certain other threats such as for network intrusions we may need real-time data mining.
Many researchers are investigating the use of data mining for intrusion detection. For example, credit card fraud detection is a form of real-time processing. However, here models are usually built ahead of time. Building models in real-time remains a challenge. Data mining can also be used for analyzing web logs as well as analyzing the audit trails. Based on the results of the data mining tool, one can then determine whether any unauthorized intrusions have occurred and/or whether any unauthorized queries have been posed then conduct an analysis using various data mining tools to see if there are potential anomalies. For example, there could be a situation where a certain user group may access the database between 3 and 5am in the morning. It could be that this group is working the night shift in which case there may be a valid explanation.
Other applications of data mining for cyber security include analyzing the audit data include analyzing the audit data. One could build a repository or a warehouse containing the audit data and then conduct an analysis using various data mining tools to see if there are potential anomalies. For example, there could be a situation where a certain user group may access the database between 3 and 5am in the morning. It could be that this group is working the night shift in which case there may be a valid explanation. However if this group is working between 9am and 5pm, then this may be an unusual occurrence.
Another example is when a person accesses the databases always between 1 and 2pm, but for the last 2 days he has been accessing the database between 1 and 2am. This could then be flagged as an unusual pattern that would need further investigation.
Insider threat analysis is also a problem both from a national security as well from a cyber- security perspective. That is, those working in a corporation who are considered to be trusted could commit espionage. Similarly those with proper access to the computer system could plant Trojan horses and viruses. Catching such terrorists is far more difficult than catching terrorists outside of an organization. One may need to monitor the access patterns of all the individuals of a corporation even if they are system administrators to see whether they are carrying out cyber-terrorism activities.