Home Expert knowledge and data analysis for detecting advanced persistent threats
Article Open Access

Expert knowledge and data analysis for detecting advanced persistent threats

  • Juan Ramón Moya EMAIL logo , Noemí DeCastro-García , Ramón-Ángel Fernández-Díaz and Jorge Lorenzana Tamargo
Published/Copyright: August 19, 2017

Abstract

Critical Infrastructures in public administration would be compromised by Advanced Persistent Threats (APT) which today constitute one of the most sophisticated ways of stealing information. This paper presents an effective, learning based tool that uses inductive techniques to analyze the information provided by firewall log files in an IT infrastructure, and detect suspicious activity in order to mark it as a potential APT. The experiments have been accomplished mixing real and synthetic data traffic to represent different proportions of normal and anomalous activity.

MSC 2010: 68T30; 68T05

1 Introduction

Most of the daily activity in modern society involves the use of electronic devices and communication networks through cyberspace. In this context, both hardware and software may suffer attacks and/or threats that compromise activities ranging from provision of public administration services to those tasks that are related to economic aspects, access to information, education, company activities, etc.

As the complexity of attacks is in constant evolution, as well as the technologies and tools they use, today we have increasingly sophisticated threats that exploit vulnerabilities in the code, the protocols, the computer systems, the communication networks, etc.

Advanced Persistent Threats (APT) constitute one of the most serious cybersecurity threats. APTs would be described as malicious or anomalous behaviors that overcome security blocks with the main aims of cyber-spying, stealing and handling sensitive private information belonging to individuals, private corporations and public corporations or even government. These threats are selective (targeted), and their effects quite harmful (see [1] or [2], among others). Moreover, since APTs usually attack sensitive infrastructures - often with law protected data -, access to such log records is in general not granted.

When all the technological tools and security measures that protect an infrastructure fail, the APT reaches its goal of compromising and stealing the valuable information of an active or a service.

Some authors have developed tools that use theoretical models or intelligent systems to detect APTs [3-6]. These systems are based on the analysis of DNS records or hosts, with the common drawback that the models depend on both the datasets and the complexity of the systems.

As the attackers are looking to remain persistent once inside the system, log analysis and identification of behavioral anomalies are usually the key for protecting an infrastructure [5]. This work proposes an intelligent system that generates predictive learning based models of behavior that help us detect anomalous activity that might be classified as APT. The system is based on supervised learning applied to the logs provided by the firewall that filters the infrastructure inbound/outbound traffic. These logs include the registers obtained from one actual APT that reached its goal of remaining persistent for some weeks before it was detected and removed. Since the system is based on real traffic data at a real infrastructure, it can be considered as productive, effective and realistic.

This paper is organized as follows: Section 2 introduces some of the previous related work; in Section 3, we describe the proposed methodology, including collection of information, data processing and statistical analysis; Section 4 shows the experimental results; Section 5 discusses the relevance of the results; then, Section 6 provides the conclusions and future work; and last, two appendices show the algorithm that normalizes the log registers, as well as reports of running our model over real datasets.

2 Background

Several theoretical frameworks provide solutions or predict APT attacks. All of them share a common methodology to design a model, and include an intermediate stage before accomplishing the classification. There is a remarkable exception [3] that does not need any intermediate stage; we first review this system.

System [3] for early detection of APT uses the approximate inference algorithm Belief propagation and data mining techniques. It uses datasets from a corporate private network, and models the communications between devices internal to the network internal (hosts) and between hosts and external domains with the help of a bipartite graph whose edges link those hosts and domains that are connected at least once during the observation period. Applying dimensionality reduction techniques, the system generates a list of suspect domains.

Several models include intermediate stages:

Attack Pyramid [6] is a model inspired by the attack tree concept proposed in [7] and [8]. Attack Pyramid uses the shape of a pyramid as a model of an APT attack whose apex represents the objective of the APT, while its faces represent the paths and barriers to overcome the threat. The authors define a context of attack in a private corporate network and generate an alert to the system so that it can decide whether there are any hazards inside the network.

System [5] provides a framework that seeks to generate a particular model depending on the scenario, and using dataset obtained by method proposed in [9]. Two datasets are stored on a stage with no attacks in order to develop the model, and a third set with artificial anomalies to train and evaluate its efficiency. This approach concludes that the model is effective combining datasets generated inside an organization without prior knowledge of their structures. Classification is an intermediate stage, but the ultimate goal is to have a model adapted to each infrastructure and automatically update it based on the inputs.

System [10] combines automatic systems created from big data and machine learning as well as expert knowledge. Besides, there is continuous feedback so that the model is updated based on new anomalous cases. The authors use datasets generated by web servers to detect attacks on the Internet services and their firewalls datasets for the analysis of possible data exfiltration. The automatic part combines three models that work in parallel (Matrix Decomposition based on the research by [11], Replicator Neural Networks and Density-based outlier analysis). This proposal concludes that the inclusion of the human knowledge achieves a detection improvement of 3.41, while the number of false positives becomes decreased by a factor of 5.

3 Methodology

This section describes the design of one model to identify possible APT attacks within network traffic.

The design of the model relies on a deep analysis of APT behavior. Hence, we have analyzed this type of attacks, as well as the different techniques they use to steal information: IP Address, Domain Lists, Peer to Peer, DomainGenerationAlgorithm or Fast Flux Domain, etc. [12]. We have also studied those security systems that could succumb to an APT, as SDH, SafeSEH, SEHOP, Stack Cookies, ASLK, PIE or NX (see [13-19]).

APTs occur rarely, hence the proportion of their log registers is very small, what means that our datasets involve imbalanced distributions. Several works propose the use of synthetic data to improve datasets which suffer of imbalanced class distributions, including non-heuristic methods such as random undersampling or oversampling [20], and those that use some kind of interpolation for oversampling the training sets [21, 22]. In our case, the imbalanced datasets were improved by random oversampling so that the experiments used actual log files (logs) generated by the firewall of an actual operating infrastructure in combination with synthetic registers generated through expert knowledge. In particular, we have created and analyzed 9 samples (Si, i = 1,..., 9) with different proportions of correct and anomalous behaviors.

Machine learning tools acquire knowledge from experience and are useful for the semiautomatic construction of programs in those cases when experience in a given resolution of tasks is available (see [23]).

In this work, we have measured the accuracy of the proposed model with several samples using bayesian techniques, decision trees and artificial neural networks. Decision trees showed better fitness. Then, we have performed validation tests over all the samples and selected some variables to be assessed: accuracy of the model created with the decision tree, improvement over the trivial model, sensitivity to harmful behavior, resistance accuracy, resistance improvement over trivial model and resistance sensitivity to harmful behavior. In order to choose the best possible proportion of activity logs, we have developed descriptive analysis over each sample with the values of the variables described above (boxplots and arithmetic means). The sample with the highest mean would point to the most adequate model. Figure 1 shows the structure of the whole process.

Fig. 1 
Structure of the process to generate an APT detector
Fig. 1

Structure of the process to generate an APT detector

Once analysis is finished, the final system runs with the best sample and is able to alert of log registers that might be related with APTs.

Regarding the technology and the software, we have used Python 3.5 and KNIME 3.1.2 to develop the process described in Figure 1, that relies on the log files obtained from the real, in operation infrastructure.

3.1 Data acquisition

The dataset we have used was composed of log registers provided by the firewall of a real, geographically dispersed, operating infrastructure involving more than thirty buildings interconnected by a fiber optic ring and centralized at a datacenter (Figure 2).

Fig. 2 
Infrastructure ring network diagram
Fig. 2

Infrastructure ring network diagram

The above mentioned infrastructure consists of more than 500 networked computers, several broadcast domains including DMZ, VPN using IPsec and SSL, more than a hundred tablets and cell phones, more than five hundred VoIP phones, three data centers (one primary and two secondary ones), cluster technology with blade and virtualized servers, more than thirty servers -both virtualized and physical-, two network security appliance high availability firewalls, one proxy-cache server, several NAS and SAN disk arrays, a management core network, intelligent management system cabling, more than twenty uninterruptible power supply units (UPS), fire detection systems, more than thirty switches distributed for voice/data communications, more than thirty communications racks, Oracle 11g Database and sole output channel Internet connection.

The infrastructure is frequently attacked by different external vectors that should be detected by security elements such as antivirus software, IDS, IPS, SIEM, etc. Whenever this defense system detects that the assets are being targeted for an attack, it generates an accurate, fast alert. If the attacker is an internal user, and the propagation of the attack is sneaky, then the threat overcomes the mentioned protection systems. Such suspect behavior can be detected by human experts by deeply analysis of the firewall log registers.

3.2 Dataset description

The analysis of the infrastructure data traffic shows that each log register (log) contains information about one specific event that was produced within the structure. The inbound/outbound traffic generates our log dataset, whose main features are the following:

  1. Volume: The daily log files average size is 5.46 GB, which means an average of 7.445.736 registers/day. Moreover most of the traffic is external (see Figure 3) and network protection services (Firewall) generate logs on the order of petabytes in size.

  2. Speed (log size/hour): Every hour, the system generates logs of 233.2 MB (310,239 registers) on average (see Figure 3).

  3. Variety: The registers (lines) in every log file include information of different nature - about events, security or traffic related. Anyway, we will only consider those registers concerning the firewall inbound/outbound traffic, as the firewall itself handles the other registers in order to automatically generate alerts in case of attack.

For this work, we have used a sample, S, of the dataset, that contains the log files of one whole month (310.239 logs per hour).

Fig. 3 
Plot of the network data traffic during one hour (as provided by the firewall software)
Fig. 3

Plot of the network data traffic during one hour (as provided by the firewall software)

3.3 Data pre-processing

Dataset sample S was chosen in such a particular time window that allowed us to classify all the logs in S .In fact, all the logs were tagged as correct behavior (no risk of APT) and, therefore, in order to complete the model, it was necessary to add synthetic logs representing anomalous behaviors (potential APT). Note that this synthetization is an experimental tool usual in absence of data [24].

Hence, we have created nine different samples, Si (i = 1,..., 9), with different correct/suspect behavior ratios (green/red behavior).

The pre-processing stage involves normalizing the initial real traffic raw logs, refining them by quantization of the information, and obtaining instances suitable for machine learning algorithms. Similarly, synthetic logs would be transformed into synthetic instances. Last, the datasets that would feed the learning algorithms are combinations of real and synthetic instances.

3.3.1 Real logs: from raw logs to normalized logs

The real logs - also called natural logs - are those generated as raw logs by our firewall to provide information about 40 items. These logs are normalized by removing those fields that human agents experience says that are not needed. The normalization algorithm was coded using Python structured programming (the interested reader can see the pseudocode in Appendix I).

As a result, normalized logs are state vectors of 12 elements - fields - that contain the non-redundant information, i.e. the discriminant fields in the raw logs that best characterize them under the security approach of identifying APT suspicious behaviors (see Table 1).

Table 1

Description of the components of the normalized log vector

Field Name Type Length Description
Date Date Numerical 8 Connection date
Time Timestamp Numerical 6 Connection time
Attack Name Attackname String 50 Name of the attack (if it exists)
Destination Country Dstcountry String 50 Targeted country
Destination IP Dstip Numerical 12 Targeted IP
Duration Duration Numerical 5 Connection duration (in seconds)
Hostname Hostname String 50 Hostname that performs the connection
Protocol Protocolo Numerical 1 Protocol used
Received bytes Rcvdbyte Numerical 12 Amount of received bytes
Sent bytes Sentbyte Numerical 12 Amount of sent bytes
Source IP Srcip Numerical 5 Attacker IP
Status Status String 30 Status of the established (or attempted) connection

3.3.2 Real Logs: From normalized logs to refined logs

Normalized logs include quantitative information whose variability and complexity must be reduced before applying learning algorithms. Hence, refined logs are the result of using expert knowledge based on simple statistical (means, ranges, frequencies, etc.) and trend analysis in order to quantize the information in the fields of the normalized logs.

The quantization of dates and times involves distinguishing between working and non-working days, on the one hand; and mornings, afternoons/evenings and nights, on the other (Equations 1 and 2).

Date={ 1working day2non-working day(1)

Time={ 1morning23evening/afternoonnight(2)

The other variables were recoded using their arithmetic means (Equation 3).

X={ 12ifxx¯ifx¯<x1.20x¯3ifx>1.20x¯(3)

where x and X are the values of the same variable before and after the quantization, respectively.

3.3.3 Real logs: from refined logs to real instances

Once logs are quantized into refined logs, they have to be converted into instances, i.e. input vectors that can feed machine learning algorithms.

We have used instances that contain 9 states related with the source IP. Eight of these states are extracted from the information in the refined logs: date, time, duration, received bytes, sent bytes, number of connections (per millisecond), number of denies (per millisecond) and average data traffic. There is an additional variable that tags the behavior associated to one instance as red or green. On the one hand, red behavior would mean that the corresponding log might be considered as anomalous and, therefore, could be related to a potential APT; while green, on the other hand, would label those activities that should be considered as harmless. In our case, all the instances coming from real traffic were classified as harmless, i.e. green.

Hence, the structure of the instances or final vectors is as follows:

I = (date, time, duration, received bytes, sent bytes, milliseconds, denies, mean traffic, group behavior) Software that converts normalized logs into real instances was Python coded.

3.3.4 From synthetic logs to synthetic instances

As mentioned above, the frequency of anomalous behaviors was low in our sample S. Thus it was necessary to create synthetic logs to represent APT related activity. These synthetic logs improve the model, allowing fast and efficient simulations of multiple scenarios.They have been massively introduced in the form of instance I, based on expert knowledge focused in 15 types of information that firewall logs provide (see Table 2).

Table 2

Discriminant information to categorize malicious behavior

ID Information
1 How long an IP is connected (in-out), (out-in)
2 Sites where it is connected
3 What days and at what times it is connected
4 What days it should be connected and what days it should not
5 What times it should be connected and at what times it should not
6 What IPs out of schedule have been connected or have had in/out traffic
7 What IPs have had access in milliseconds or many times in a second to one or several sites
8 What external IP’s the computer usually connects to
9 The IP requests DNS resolves which do not exit
10 The IP is connected to external addresses from questionable reputation sites
11 IPs with many firewall DENYs
12 IPs which connect continuously with DNS servers
13 Very high traffic of an IP to the outside
14 Average traffic of an IP
15 IPs that greatly exceed their average traffic (thresholded)

The synthetic logs were generated with the help of several correlation rules that simulate combinations of values in the instance that are usually related to malicious activity. The number of such correlation rules may be high, and directly depends on the size of malicious behavior in the initial sample. The actual records in our real data allow to establish accurate rules or hypothesis, but increasing the number of tests or adaptations could give better approximations to reality.

Let xi (i = 1,..., 9) be the value of each state in the instance, as described in 3.3.3 (e.g. x1 = xdate, x9 = xgroup behaviour). Then, Equation 4 shows the following correlation rules, where |w|a stands for the number of as in string w:

x9={ r(red)if{ |x|3>2(|x|2>2)(|x|3>0)(x1=2)x2=3(x1>1)[(x3=3)(x5=3)(x6=3)(x7=3)](x2>1)[(x5=3)(x6=3)(x7=3)]ororororg(green)in other case(4)

The result of applying the correlation rules corresponds to the group behavior - last field of the instance. Hence, some examples of instances after using the correlation rules are the following: (1, 1, 1, 2, 1, 1, 1, 2, g), (1, 2, 2, 1, 2, 1, 1, 2, g), (2, 1, 1, 1, 1, 2, 3, 1, r), (2, 3, 1, 1, 1, 1, 1, 1, r) or (1, 3, 1, 1, 2, 1, 1, 1, r).

All the malicious (red) behavior in our dataset is synthetic, but such red traffic activity incorporates knowledge from actual exfiltration attempts that had been formerly detected in the real, in operation infrastructure. It is assumed that the malicious synthetic traffic corresponds to 100% (98% synthetic + 2% copy of malicious samples taken from real traffic).

The synthetic logs are injected by two applications written in Python. The first one provides an interactive environment that allows generating logs assigning values to each field using some predefined criteria. The configuration parameters are the source filename and the number of logs to be inserted into the source file so as to simulate attacks over the infrastructure.

The second application massively injects logs of attacks without user intervention in order to mix harmless and malicious activities, getting for each record -lines in the log file- as many lines as fields in the record, and increasing the value of the fields in some percentage above the average. These logs had information from previously injected attacks.

3.3.5 From instances to sample datasets

The sample datasets Si (i = 1,..., 9) suitable for feeding machine learning algorithms were created from the real and synthetic instances. The samples are composed of 20% random real data and 80% synthetic data, with different proportions of green/red behavior so that we might find the best ratio for our model (see Table 3).

Table 3

Sample Datasets, Si

Sample Green Behavior (%) Red Behavior (%)
S1 23.70 76.30
S2 69.95 30.05
S3 52.28 47.72
S4 48.98 51.02
S5 50.00 50.00
S6 70.83 29.17
S7 12.00 88.00
S8 60.48 39.52
S9 100 0

3.4 Data analysis

This section describes the machine learning techniques used with the dataset, that include Naïve-Bayes, Decision Tree (ID3-C4.5) and Artificial Neural Networks.

Naïve-Bayesian classifier learns the conditional probability of each attribute Ai from the training data given the class label, C. After the training stage, the probability of C given one particular instance of A1,... An is computed by applying Bayes rule in order to predict the class with the highest probability [25]. In this work, the classifier uses the number of rows per attribute value and class for those attributes that are nominal, and a Gaussian distribution for the numeric attributes.

Decision Tree Induction is frequently used in Machine Learning or Data Mining because of its remarkable advantages: they are capable of learning functions from discrete values, even with noisy samples, and obtain sets of expressions that can be easily translated into sets of rules.

In particular, the C4.5 algorithm belongs to the Top Down Induction of Decision Trees family (TDIDT). It generates a decision tree using a “divide and conquer” algorithm, and evaluates information in each case using the following criteria: entropy, gain or proportion gain, as applicable. Besides, the heuristic is based on statistics, making it robust to noise [23].

Artificial neural networks can make decisions from a numerical set of examples, as the function is implicitly determined by that set of examples. Therefore, their objective is simulating the function that characterizes all the elements in the set. Inputs are numerical in the scheme attribute value. The learning method seeks to minimize the error for all the training examples, and has a great capacity to absorb noise [23].

In particular, we used the Probabilistic Neural Network (PNN) based on the DDA (Dynamic Decay Adjustment). This PNN works with labeled data using Constructive Training of PNN as the underlying algorithm, where each rule is defined as a high-dimensional Gaussian function adjusted by two thresholds in order to avoid conflicts with rules of different classes [26]. In particular, the training sets consisted of 65% of each sample, Si, while the remaining 35% was used for test. Table 4 shows the proportions of green and red behavior (GB, RB) in each sample.

Table 4

Proportions of green/red behavior (GB/RB) in the training and test sets

Training set Test set
Sample GB (%) RB (%) GB (%) RB (%)
S1 15.40 49.50 8.30 26.80
S2 45.45 19.53 24.50 10.52
S3 33.98 31.02 18.30 16.70
S4 31.84 33.16 17.14 17.86
S5 32.50 32.50 17.50 17.50
S6 46.03 18.97 24.80 10.20
S7 7.60 57.40 4.40 30.60
S8 39.28 25.72 21.20 13.80
S9 65.00 0 35.00 0

After the training stage, the results with the test sets pointed to Decision Tree as better choice than Naïve-Bayes and PNN. For that very model and for every sample, Si, its resistance is measured using sample, Sj with different green/red behavior proportion as validation tests; for instance S6 validates the model built using S1, while S2 is validated by S1 and so on. In all the analysis, the improvement of each model with respect to the trivial one has been measured. Such trivial model would use the most frequent behavior in the sample to label every unknown activity, i.e. if most of the activity in the sample is green behavior, then the trivial model would label all the elements as green behavior.

Furthermore, sensitivity analysis of red behavior tests has also been accomplished because it is important to avoid false negatives when detecting harmful behaviors in the context of Cybersecurity.

Finally, the values of accuracy and resistance accuracy, of the improvements achieved over the trivial model, and the sensitivities to red behavior would be used to analyze the performance with the decision tree. These values would, then, be quantized considering the quartile they belong to (4 for the upper quartile, 1 for the lower one), and their average for each sample would estimate its fitness, corresponding the best sample to the highest average.

4 Results

Using the confusion matrices of the analysis tests described in the above section over each Si, we have obtained the results shown in Table 5. We have included the accuracy and the error obtained with the techniques of Naïve Bayes (NB), Decision Tree (DT) and Artificial Neural Networks (ANN).

Table 5

Accuracies and errors using Naïve Bayes, a decision tree and a probabilistic neural network

Naïve Bayes ID3-C4.5 Decision Tree PNN
Sample Accuracy (%) Error (%) Accuracy (%) Error (%) Accuracy (%) Error (%)
S1 90.26 9.74 95.96 4.04 91.12 8.88
S2 85.47 14.53 99.24 0.76 88.20 11.80
S3 89.03 10.97 97.49 2.51 86.39 13.61
S4 89.15 10.85 97.44 2.56 88.69 11.31
S5 86.76 13.24 97.95 2.05 89.87 10.13
S6 85.52 14.48 97.99 2.01 83.18 16.82
S7 89.20 10.80 95.10 4.90 88.57 11.43
S8 85.25 14.75 98.82 1.18 91.64 8.36
S9 83.29 16.71 100 0 100 0

The confusion matrices results of the analysis described in Section 3.4 are summarized in Table 5, which shows the ID3-C4.5 decision tree provides better accuracies and errors than Naïve Bayes and the probabilistic neural network. Hence, Table 6 shows the values of accuracy, improvement over the trivial model and sensitivity for each of the samples when using such decision tree.

Table 6

Confusion matrix over Si after using the decision tree

Sample Accuracy (%) Error (%) Improvement over trivial model (%) Sensitivity (g) Sensitivity (r)
S1 95.96 4.05 19.73 1 0.95
S2 99.24 0.76 29.24 0.98 1
S3 97.49 2.51 45.20 0.95 1
S4 97.44 2.56 46.42 1 0.95
S5 97.95 2.05 47.95 0.96 1
S6 97.99 2.02 27.13 0.98 0.98
S7 95.10 4.90 7.67 0.94 1
S8 98.82 1.18 38.25 0.95 1
S9 100 0 0 1

Table 7 shows the results of analyzing resistance accuracy, resistance improvement over the trivial model, and resistance sensitivity for every sample. Improvement over the trivial model is the result of subtracting the higher behavior value for the validation sample (in table 3) from the accuracy/resistance value. The table includes the samples that were used as validation tests, and does not consider S9, as it does not improve the trivial model.

Table 7

Results over validation sets of sample datasets Si.

Sample Validation sample Accuracy/Resistance (%) Improvement over trivial model (%) Sensitivity (r)
S1 S2 87.37 17.42 0.831
S2 S1 99.98 23.68 1
S3 S4 100 48.98 1
S4 S3 100 47.72 1
S5 S1 99.98 23.68 1
S6 S1 84.14 7.84 0.79
S7 S2 100 30.95 1
S8 S1 96.63 20.33 1

Figures 4 and 5 show the boxplots regarding the accuracy and resistance accuracy for each sample, as well as the improvements over the trivial model, and the sensitivities. Last, Figure 6 shows a bar chart with the mentioned variables after quantization.

Fig. 4 
Boxplots of accuracy, improvement over trivial model and sensitivity
Fig. 4

Boxplots of accuracy, improvement over trivial model and sensitivity

Fig. 5 
Boxplots of resistance accuracy, resistance improvement over trivial model and resistance sensitivity
Fig. 5

Boxplots of resistance accuracy, resistance improvement over trivial model and resistance sensitivity

Fig. 6 
Quantized values of accuracy and resistance accuracy, improvements over trivial model, and sensitivities
Fig. 6

Quantized values of accuracy and resistance accuracy, improvements over trivial model, and sensitivities

5 Discussion

The experimental results led to choosing ID3-C4.5 decision tree to detect anomalous behaviors in the network activity. In order to select those samples that fit best according to accuracy, improvement over the trivial model, sensitivity to red behavior, resistance’s accuracy, resistance’s improvement and resistance’s sensitivity to red behavior, their values are quartile binned first. Then, the mean of such binned variables is used as a measure of the fitness. The results shown in Figure 6 point to S3 and S5 as the best ones, both of them with the higher mean values (equal to 3.33). The resulting model has been used to develop an intelligent system that takes the firewall raw logs as inputs and fires alerts in case of potential APT activity (Figure 7).

Fig. 7 
State diagram of the decision tree based intelligent system
Fig. 7

State diagram of the decision tree based intelligent system

The system has proven to be effective, and uses technologies that do not depend on the architecture. However, the model would require continuous updating based on monitoring suspicious activity so as to improve the accuracy of logs categorization. Furthermore, the use of distributed data storage and HPC (High Performance Computing) technologies would allow real-time processing and, hence, improving the performance and, eventually, anticipating the APTs actions.

6 Conclusions and future work

The proposed intelligent system predicts suspicious behaviors by analyzing the data traffic in an IT infrastructure, and triggering alerts so that the administrator does not have to read the whole log files. The results conclude that the proposal is suitable for the goal of early detection of APTs, i.e. for proactive security.

Future work is focused on improving the model by monitoring suspicious results and, thus, defining the process of cataloguing such anomalous behaviors. Besides, performance might be improved with the incorporation of real time HPC and Big Data technologies.

Appendices

I Pseudocode: from raw logs to normalized logs

  • DO UNTIL end input data:

    • Add item to the list.

    • Next item

  • Endo

select file.

Record file list.

Open finput;*** Rawlog file

Open foutput;*** Logs Standardized file.

Read finput:

DO UNTIL FF:

  • DO FOR i=0 to end of list;

    • Take list field.

    • Search and take record file

    • Insert data record foutput.

  • Endo

  • Write record foutput.

  • Read finput.

Enddo

Close finput, foutput

II Running the model against real data

We have tested the S5 model using the decision tree with KNIME over two real datasets. On the one hand, the first dataset represents normal activity, i.e. with no APT logs. The second dataset, on the other hand, contains data concerning one APT that attacked an actual infrastructure and that remained persistent for 25 days, until it was detected by inspection (and removed). During that time the APT generated 3710 firewall log entries. The experiments were carried out sampling the datasets while maintaining the proportion of dangerous/innocuous log registers.

Note that the S5 model gives no false alerts with any of both datasets (Figures 8 and 10), and that does not happen with other models, even for harmless datasets (Figure 9). Although it gives a number of false alerts with the second dataset, the true ones are significant enough to effectively detect and remove the attack, since all of the registers came from the same source.

Fig. 8 
Confusion matrix using S5 model and the decision tree over a harmless dataset, Dh
Fig. 8

Confusion matrix using S5 model and the decision tree over a harmless dataset, Dh

Fig. 9 
Confusion matrix using S3 model and the decision tree over Dh
Fig. 9

Confusion matrix using S3 model and the decision tree over Dh

Fig. 10 
Confusion matrix using S5 model and the decision tree over a dataset that contains APT activity
Fig. 10

Confusion matrix using S5 model and the decision tree over a dataset that contains APT activity

References

[1] Falliere N., Murchu L.O., Chien E., W32.Stuxnet dossier, White paper, Symantec Corp., Security Response 5, 2011Search in Google Scholar

[2] Holguín J.M., Moreno Maite, Merino B., Detección de APTs, CSIRT-CV and INTECO-CERT, Comunidad Valenciana-León, 2013Search in Google Scholar

[3] Oprea A., Li Z., Yen T. F., Chin S. H., Alrwais S., Detection of early-stage Enterprise infection by mining large-scale log data, In: 2015 45th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2015, 45-5610.1109/DSN.2015.14Search in Google Scholar

[4] Mosso J.M.R., Ciberseguridad Inteligente, arXiv preprint arXiv:1506.03830, 2015Search in Google Scholar

[5] Friedberg I., Skopik F., Settanni G., Fiedler R., Combating advanced persistent threats: From network event correlation to incident detection, Computers & Security, 2015, 48, 35-5710.1016/j.cose.2014.09.006Search in Google Scholar

[6] Giura P., Wang W., Using large scale distributed computing to unveil advanced persistent threats, Science Journal, 2012, 1 (3), 93-105Search in Google Scholar

[7] Amoroso E.G., Fundamentals of computer security technology, Upper Saddle River, NJ, USA, Prentice-Hall, Inc., 1994Search in Google Scholar

[8] Schneier B., Attack Trees - Modeling Security Threats, Dr. Dobb’s Journal, 1999, https://www.schneier.com/academic/archives/1999/12/attack_trees.htmlSearch in Google Scholar

[9] Skopik F., Settanni G., Fiedler R., Friedberg I., Semi-synthetic data set generation for security software evaluation, In: 12th Annual Conference on Privacy, Security and Trust. IEEE, 2014, 156-16310.1109/PST.2014.6890935Search in Google Scholar

[10] Veeramachaneni K., Arnaldo I., Korrapati V., Bassias C., Li K., AI2: training a big data machine to defend, In: Proceedings - 2nd IEEE International Conference on Big Data Security on Cloud, IEEE BigDataSecurity 2016, 2nd IEEE International Conference on High Performance and Smart Computing, IEEE HPSC 2016 and IEEE International Conference on Intelligent Data and Security, IEEE IDS 2016, 2016, 49-5410.1109/BigDataSecurity-HPSC-IDS.2016.79Search in Google Scholar

[11] Shyu M. L., Chen S. C., Sarinnapakorn K., Chang L., A novel anomaly detection scheme based on principal component classifier, In: Proceedings of the IEEE Foundations and New Directions of Data Mining Workshop, in conjunction with the Third IEEE International Conference on Data Mining (ICDM’03), 2003, 172-179Search in Google Scholar

[12] Zinksecurity Thinking solutions, Advanced Persistent Threats (APTs), Guardia Civil, España, 2015Search in Google Scholar

[13] Yao X., Pang J., Zhang Y., Yu Y., Lu, J., A method and implementation of control flow obfuscation using SEH, In: Proceedings of the 4th International Conference on Multimedia Information Networking and Security, MINES 2012, 2012, 336-33910.1109/MINES.2012.25Search in Google Scholar

[14] Wei Q., Wei T., Wang J., Evolution of exploitation and exploit mitigation, Journal of Tsinghua University, 2011, 51 (10), 1274-1280Search in Google Scholar

[15] Support Microsoft, How to enable Structured Exception Handling Overwrite Protection (SEHOP) in Windows operating systems, 2011, https://support.microsoft.com/en-us/help/956607/how-to-enable-structured-exception-handling-overwrite-protection-sehop-in-windows-operating-systemsSearch in Google Scholar

[16] Dang T.H., Maniatis P., Wagner, D., The performance cost of shadow stacks and stack canaries, In: Proceedings of the 10th ACM Symposium on Information, Computer and Communications Security (ASIACCS 2015), 2015, 555-55610.1145/2714576.2714635Search in Google Scholar

[17] Sood A., Enbody R., Targeted Cyber Attacks: Multi-staged Attacks Driven by Exploits and Malware, Syngress, 2014Search in Google Scholar

[18] Sharp B.L., Peterson G.D., Yan L. K., Extending hardware based mandatory access controls for memory to multicore architectures, In: Proceedings of the 4th annual workshop on Cyber Security and Information Intelligence Research: Developing Strategies to meet the Cyber Security and Information Intelligence challenges ahead (CSIIRW ’08), ACM, 2008, 23:1-23:310.1145/1413140.1413167Search in Google Scholar

[19] Yotiyana J. P., Mishra A., Secure Authentication: Eliminating Possible Backdoors in Client-Server Endorsement, Procedia Computer Science, 2016, 85, 606-61510.1016/j.procs.2016.05.227Search in Google Scholar

[20] López V., Fernández A., García S., Palade V., Herrera F., An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics, Information Sciences, 2013, 250, 113-14110.1016/j.ins.2013.07.007Search in Google Scholar

[21] Han H., Wang W.Y., Mao, B.H., Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning, In: Proceedings of the 2005 International Conference on Intelligent Computing (ICIC’05), Lecture Notes in Computer Science, 2005, 3644, 878-88710.1007/11538059_91Search in Google Scholar

[22] He H., Bai Y., Garcia E.A., Li S., ADASYN: adaptive synthetic sampling approach for imbalanced learning, In: Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IJCNN’08), 2008, 1322-1328Search in Google Scholar

[23] Borrajo D., González J., Isasi P., Aprendizaje Automático. Ed. Sanz y Torres, S.L, 2013Search in Google Scholar

[24] López-Cabeceira M.M., Diez-Machío H., Trobajo M.T, Carriegos, M.V., Spectra analysis in detection of traces of explosives, Int. J. Modern Phys. B, 2012, 26 (25), 124601310.1142/S0217979212460137Search in Google Scholar

[25] Friedman N., Geiger D., Goldszmidt M., Bayesian Network Classifiers, Machine Learning, 1997, 29 (2-3), 131-16310.1023/A:1007465528199Search in Google Scholar

[26] Berthold M.R., Diamond J., Constructive training of probabilistic neural networks, Neurocomputing, 1998, 19 (1), 167-18310.1016/S0925-2312(97)00063-5Search in Google Scholar

Received: 2016-11-15
Accepted: 2017-5-4
Published Online: 2017-8-19

© 2017 Moya et al.

This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 License.

Articles in the same Issue

  1. Regular Articles
  2. Integrals of Frullani type and the method of brackets
  3. Regular Articles
  4. Edge of chaos in reaction diffusion CNN model
  5. Regular Articles
  6. Calculus using proximities: a mathematical approach in which students can actually prove theorems
  7. Regular Articles
  8. An investigation on hyper S-posets over ordered semihypergroups
  9. Regular Articles
  10. The Leibniz algebras whose subalgebras are ideals
  11. Regular Articles
  12. Fixed point and multidimensional fixed point theorems with applications to nonlinear matrix equations in terms of weak altering distance functions
  13. Regular Articles
  14. Matrix rank and inertia formulas in the analysis of general linear models
  15. Regular Articles
  16. The hybrid power mean of quartic Gauss sums and Kloosterman sums
  17. Regular Articles
  18. Tauberian theorems for statistically (C,1,1) summable double sequences of fuzzy numbers
  19. Regular Articles
  20. Some properties of graded comultiplication modules
  21. Regular Articles
  22. The characterizations of upper approximation operators based on special coverings
  23. Regular Articles
  24. Bi-integrable and tri-integrable couplings of a soliton hierarchy associated with SO(4)
  25. Regular Articles
  26. Dynamics for a discrete competition and cooperation model of two enterprises with multiple delays and feedback controls
  27. Regular Articles
  28. A new view of relationship between atomic posets and complete (algebraic) lattices
  29. Regular Articles
  30. A class of extensions of Restricted (s, t)-Wythoff’s game
  31. Regular Articles
  32. New bounds for the minimum eigenvalue of 𝓜-tensors
  33. Regular Articles
  34. Shintani and Shimura lifts of cusp forms on certain arithmetic groups and their applications
  35. Regular Articles
  36. Empirical likelihood for quantile regression models with response data missing at random
  37. Regular Articles
  38. Convex combination of analytic functions
  39. Regular Articles
  40. On the Yang-Baxter-like matrix equation for rank-two matrices
  41. Regular Articles
  42. Uniform topology on EQ-algebras
  43. Regular Articles
  44. Integrations on rings
  45. Regular Articles
  46. The quasilinear parabolic kirchhoff equation
  47. Regular Articles
  48. Avoiding rainbow 2-connected subgraphs
  49. Regular Articles
  50. On non-Hopfian groups of fractions
  51. Regular Articles
  52. Singularly perturbed hyperbolic problems on metric graphs: asymptotics of solutions
  53. Regular Articles
  54. Rings in which elements are the sum of a nilpotent and a root of a fixed polynomial that commute
  55. Regular Articles
  56. Superstability of functional equations related to spherical functions
  57. Regular Articles
  58. Evaluation of the convolution sum involving the sum of divisors function for 22, 44 and 52
  59. Regular Articles
  60. Weighted minimal translation surfaces in the Galilean space with density
  61. Regular Articles
  62. Complete convergence for weighted sums of pairwise independent random variables
  63. Regular Articles
  64. Binomials transformation formulae for scaled Fibonacci numbers
  65. Regular Articles
  66. Growth functions for some uniformly amenable groups
  67. Regular Articles
  68. Hopf bifurcations in a three-species food chain system with multiple delays
  69. Regular Articles
  70. Oscillation and nonoscillation of half-linear Euler type differential equations with different periodic coefficients
  71. Regular Articles
  72. Osculating curves in 4-dimensional semi-Euclidean space with index 2
  73. Regular Articles
  74. Some new facts about group 𝒢 generated by the family of convergent permutations
  75. Regular Articles
  76. lnfinitely many solutions for fractional Schrödinger equations with perturbation via variational methods
  77. Regular Articles
  78. Supersolvable orders and inductively free arrangements
  79. Regular Articles
  80. Asymptotically almost automorphic solutions of differential equations with piecewise constant argument
  81. Regular Articles
  82. Finite groups whose all second maximal subgroups are cyclic
  83. Regular Articles
  84. Semilinear systems with a multi-valued nonlinear term
  85. Regular Articles
  86. Positive solutions for Hadamard differential systems with fractional integral conditions on an unbounded domain
  87. Regular Articles
  88. Calibration and simulation of Heston model
  89. Regular Articles
  90. One kind sixth power mean of the three-term exponential sums
  91. Regular Articles
  92. Cyclic pairs and common best proximity points in uniformly convex Banach spaces
  93. Regular Articles
  94. The uniqueness of meromorphic functions in k-punctured complex plane
  95. Regular Articles
  96. Normalizers of intermediate congruence subgroups of the Hecke subgroups
  97. Regular Articles
  98. The hyperbolicity constant of infinite circulant graphs
  99. Regular Articles
  100. Scott convergence and fuzzy Scott topology on L-posets
  101. Regular Articles
  102. One sided strong laws for random variables with infinite mean
  103. Regular Articles
  104. The join of split graphs whose completely regular endomorphisms form a monoid
  105. Regular Articles
  106. A new branch and bound algorithm for minimax ratios problems
  107. Regular Articles
  108. Upper bound estimate of incomplete Cochrane sum
  109. Regular Articles
  110. Value distributions of solutions to complex linear differential equations in angular domains
  111. Regular Articles
  112. The nonlinear diffusion equation of the ideal barotropic gas through a porous medium
  113. Regular Articles
  114. The Sheffer stroke operation reducts of basic algebras
  115. Regular Articles
  116. Extensions and improvements of Sherman’s and related inequalities for n-convex functions
  117. Regular Articles
  118. Classification lattices are geometric for complete atomistic lattices
  119. Regular Articles
  120. Possible numbers of x’s in an {x, y}-matrix with a given rank
  121. Regular Articles
  122. New error bounds for linear complementarity problems of weakly chained diagonally dominant B-matrices
  123. Regular Articles
  124. Boundedness of vector-valued B-singular integral operators in Lebesgue spaces
  125. Regular Articles
  126. On the Golomb’s conjecture and Lehmer’s numbers
  127. Regular Articles
  128. Some applications of the Archimedean copulas in the proof of the almost sure central limit theorem for ordinary maxima
  129. Regular Articles
  130. Dual-stage adaptive finite-time modified function projective multi-lag combined synchronization for multiple uncertain chaotic systems
  131. Regular Articles
  132. Corrigendum to: Dual-stage adaptive finite-time modified function projective multi-lag combined synchronization for multiple uncertain chaotic systems
  133. Regular Articles
  134. Convergence and stability of generalized φ-weak contraction mapping in CAT(0) spaces
  135. Regular Articles
  136. Triple solutions for a Dirichlet boundary value problem involving a perturbed discrete p(k)-Laplacian operator
  137. Regular Articles
  138. OD-characterization of alternating groups Ap+d
  139. Regular Articles
  140. On Jordan mappings of inverse semirings
  141. Regular Articles
  142. On generalized Ehresmann semigroups
  143. Regular Articles
  144. On topological properties of spaces obtained by the double band matrix
  145. Regular Articles
  146. Representing derivatives of Chebyshev polynomials by Chebyshev polynomials and related questions
  147. Regular Articles
  148. Chain conditions on composite Hurwitz series rings
  149. Regular Articles
  150. Coloring subgraphs with restricted amounts of hues
  151. Regular Articles
  152. An extension of the method of brackets. Part 1
  153. Regular Articles
  154. Branch-delete-bound algorithm for globally solving quadratically constrained quadratic programs
  155. Regular Articles
  156. Strong edge geodetic problem in networks
  157. Regular Articles
  158. Ricci solitons on almost Kenmotsu 3-manifolds
  159. Regular Articles
  160. Uniqueness of meromorphic functions sharing two finite sets
  161. Regular Articles
  162. On the fourth-order linear recurrence formula related to classical Gauss sums
  163. Regular Articles
  164. Dynamical behavior for a stochastic two-species competitive model
  165. Regular Articles
  166. Two new eigenvalue localization sets for tensors and theirs applications
  167. Regular Articles
  168. κ-strong sequences and the existence of generalized independent families
  169. Regular Articles
  170. Commutators of Littlewood-Paley gκ -functions on non-homogeneous metric measure spaces
  171. Regular Articles
  172. On decompositions of estimators under a general linear model with partial parameter restrictions
  173. Regular Articles
  174. Groups and monoids of Pythagorean triples connected to conics
  175. Regular Articles
  176. Hom-Lie superalgebra structures on exceptional simple Lie superalgebras of vector fields
  177. Regular Articles
  178. Numerical methods for the multiplicative partial differential equations
  179. Regular Articles
  180. Solvable Leibniz algebras with NFn Fm1 nilradical
  181. Regular Articles
  182. Evaluation of the convolution sums ∑al+bm=n lσ(l) σ(m) with ab ≤ 9
  183. Regular Articles
  184. A study on soft rough semigroups and corresponding decision making applications
  185. Regular Articles
  186. Some new inequalities of Hermite-Hadamard type for s-convex functions with applications
  187. Regular Articles
  188. Deficiency of forests
  189. Regular Articles
  190. Perfect codes in power graphs of finite groups
  191. Regular Articles
  192. A new compact finite difference quasilinearization method for nonlinear evolution partial differential equations
  193. Regular Articles
  194. Does any convex quadrilateral have circumscribed ellipses?
  195. Regular Articles
  196. The dynamic of a Lie group endomorphism
  197. Regular Articles
  198. On pairs of equations in unlike powers of primes and powers of 2
  199. Regular Articles
  200. Differential subordination and convexity criteria of integral operators
  201. Regular Articles
  202. Quantitative relations between short intervals and exceptional sets of cubic Waring-Goldbach problem
  203. Regular Articles
  204. On θ-commutators and the corresponding non-commuting graphs
  205. Regular Articles
  206. Quasi-maximum likelihood estimator of Laplace (1, 1) for GARCH models
  207. Regular Articles
  208. Multiple and sign-changing solutions for discrete Robin boundary value problem with parameter dependence
  209. Regular Articles
  210. Fundamental relation on m-idempotent hyperrings
  211. Regular Articles
  212. A novel recursive method to reconstruct multivariate functions on the unit cube
  213. Regular Articles
  214. Nabla inequalities and permanence for a logistic integrodifferential equation on time scales
  215. Regular Articles
  216. Enumeration of spanning trees in the sequence of Dürer graphs
  217. Regular Articles
  218. Quotient of information matrices in comparison of linear experiments for quadratic estimation
  219. Regular Articles
  220. Fourier series of functions involving higher-order ordered Bell polynomials
  221. Regular Articles
  222. Simple modules over Auslander regular rings
  223. Regular Articles
  224. Weighted multilinear p-adic Hardy operators and commutators
  225. Regular Articles
  226. Guaranteed cost finite-time control of positive switched nonlinear systems with D-perturbation
  227. Regular Articles
  228. A modified quasi-boundary value method for an abstract ill-posed biparabolic problem
  229. Regular Articles
  230. Extended Riemann-Liouville type fractional derivative operator with applications
  231. Topical Issue on Topological and Algebraic Genericity in Infinite Dimensional Spaces
  232. The algebraic size of the family of injective operators
  233. Topical Issue on Topological and Algebraic Genericity in Infinite Dimensional Spaces
  234. The history of a general criterium on spaceability
  235. Topical Issue on Topological and Algebraic Genericity in Infinite Dimensional Spaces
  236. On sequences not enjoying Schur’s property
  237. Topical Issue on Topological and Algebraic Genericity in Infinite Dimensional Spaces
  238. A hierarchy in the family of real surjective functions
  239. Topical Issue on Topological and Algebraic Genericity in Infinite Dimensional Spaces
  240. Dynamics of multivalued linear operators
  241. Topical Issue on Topological and Algebraic Genericity in Infinite Dimensional Spaces
  242. Linear dynamics of semigroups generated by differential operators
  243. Special Issue on Recent Developments in Differential Equations
  244. Isomorphism theorems for some parabolic initial-boundary value problems in Hörmander spaces
  245. Special Issue on Recent Developments in Differential Equations
  246. Determination of a diffusion coefficient in a quasilinear parabolic equation
  247. Special Issue on Recent Developments in Differential Equations
  248. Homogeneous two-point problem for PDE of the second order in time variable and infinite order in spatial variables
  249. Special Issue on Recent Developments in Differential Equations
  250. A nonlinear plate control without linearization
  251. Special Issue on Recent Developments in Differential Equations
  252. Reduction of a Schwartz-type boundary value problem for biharmonic monogenic functions to Fredholm integral equations
  253. Special Issue on Recent Developments in Differential Equations
  254. Inverse problem for a physiologically structured population model with variable-effort harvesting
  255. Special Issue on Recent Developments in Differential Equations
  256. Existence of solutions for delay evolution equations with nonlocal conditions
  257. Special Issue on Recent Developments in Differential Equations
  258. Comments on behaviour of solutions of elliptic quasi-linear problems in a neighbourhood of boundary singularities
  259. Special Issue on Recent Developments in Differential Equations
  260. Coupled fixed point theorems in complete metric spaces endowed with a directed graph and application
  261. Special Issue on Recent Developments in Differential Equations
  262. Existence of entropy solutions for nonlinear elliptic degenerate anisotropic equations
  263. Special Issue on Recent Developments in Differential Equations
  264. Integro-differential systems with variable exponents of nonlinearity
  265. Special Issue on Recent Developments in Differential Equations
  266. Elliptic operators on refined Sobolev scales on vector bundles
  267. Special Issue on Recent Developments in Differential Equations
  268. Multiplicity solutions of a class fractional Schrödinger equations
  269. Special Issue on Recent Developments in Differential Equations
  270. Determining of right-hand side of higher order ultraparabolic equation
  271. Special Issue on Recent Developments in Differential Equations
  272. Asymptotic approximation for the solution to a semi-linear elliptic problem in a thin aneurysm-type domain
  273. Topical Issue on Metaheuristics - Methods and Applications
  274. Learnheuristics: hybridizing metaheuristics with machine learning for optimization with dynamic inputs
  275. Topical Issue on Metaheuristics - Methods and Applications
  276. Nature–inspired metaheuristic algorithms to find near–OGR sequences for WDM channel allocation and their performance comparison
  277. Topical Issue on Cyber-security Mathematics
  278. Monomial codes seen as invariant subspaces
  279. Topical Issue on Cyber-security Mathematics
  280. Expert knowledge and data analysis for detecting advanced persistent threats
  281. Topical Issue on Cyber-security Mathematics
  282. Feedback equivalence of convolutional codes over finite rings
Downloaded on 21.9.2025 from https://www.degruyterbrill.com/document/doi/10.1515/math-2017-0094/html?lang=en
Scroll to top button