Friday, September 22, 2017

Using machine and deep learning concepts to detect anomalies

Using machine and deep learning concepts to detect anomalies


Most of the threat detection software are based on signature (pattern) techniques that look for evidence known to be indicative of misuse. Those threats can be from hackers on the Internet, on the local network or legitimate users of the network and such protection system must always be up to date since it can detect only attacks for which patterns are built into the system. However, using machine learning techniques it is possible to learn from and make predictions based on data in order to create anomaly based detection system that looks for signs that something is not normal user behavior.

Among some of the popular techniques used for anomaly detection, two will be briefly analyzed in this post: Long short term memory (LSTM) and One Class Support Vector Machines (OCSVM).

LSTM
The idea of recurrent neural network (RNN) is to make use of sequential information which depend on each other. Unlike feedforward networks, the output depends on the previous computation and same task is performed for every element of a sequence. Thats why they are considered to have a "memory" about what has been calculated, but they are limited to looking only a few steps back. LSTM is a RNN architecture designed to be better at storing and finding long range dependencies in the data than standard RNNs. Also, LSTM can map one input to many outputs, many inputs to many outputs or many inputs to one output.

The diagram (and the legend) below illustrates how data flows through LSTM memory cell.

Source: DL4J
The gates in the diagram are used to remove or add information to the cell state by using logistic functions to compute a value between 0 and 1 (sigmoid) or -1 and 1 (hyperbolic tangent). For example, to create an update to the state the input gate (sigmoid), which decides values that will get updated, is multiplied with the block input (tanh). If the forget gate is open, the information from the old state is kept. The state is then filtered for the output by the output gate that decides which parts of the cell state to output and the tanh activation function which is used to push the values of the state to be between ?1 and 1.

The blue arrows represent "peephole connections" which is one popular LSTM variant that lets the gate layers look at the cell state. There are other variants to LSTM but the diagram above represents the traditional one thus you can ignore those blue arrows.

OCSVM
This machine learning algorithm builds a model from training on normal data and then classifies test data as either normal or attack, based on its geometrical deviation from the training data. So, it is used for two-class problems in which only one of the two classes can be described well. Those classes are respectively called the target class and the outlier class. The boundary between those two classes must be defined such that it includes maximum number of target examples and minimizes the chance of accepting outliers.

The algorithm maps input data into a high dimensional feature space (via a kernel function) and determines a hyperplane (linear decision boundary) that best separates the training set from the origin with maximal margin. The mentioned kernel function commonly used for mapping the points is the RBF kernel.

Source: "A real time OCSVM Intrusion Detection module with low overhead for SCADA systems",
Leandros Maglaras & Jianmin Jing


Using the hyperplane, a label can be assigned to any test example x such as: if f (x) < 0, then x is labeled as an anomaly (outlier class), otherwise it is labeled normal (target class).

Notice that this blog post has been written with the assumption that the reader already has some knowledge about neural networks. If thats not the case, I recommend this learning source.


download file now