network behavior anomaly detection（网络行为异常检测）下载_在线阅读_13

首页 > network behavior anomaly detection（网络行为异常检测）

is_197226

暂无简介

network behavior anomaly detection（网络行为异常检测） Telecommun Syst (2012) 50:1–13 DOI 10.1007/s11235-010-9384-1 Large-scale IP network behavior anomaly detection and identification using substructure-based approach and multivariate time series mining Weisong He · Guangmin Hu · Yingjie Zhou Published online: 4 Au...

Telecommun Syst (2012) 50:1–13 DOI 10.1007/s11235-010-9384-1 Large-scale IP network behavior anomaly detection and identification using substructure-based approach and multivariate time series mining Weisong He · Guangmin Hu · Yingjie Zhou Published online: 4 August 2010 © Springer Science+Business Media, LLC 2010 Abstract In this paper, a substructure-based network be- havior anomaly detection approach, called WFS (Weighted Frequent Subgraphs), is proposed to detect the anomalies of a large-scale IP networks. With application of WFS, an entire graph is examined, unusual substructures of which are reported. Due to additional information given by the graph, the anomalies are able to be detected more accurately. With multivariate time series motif association rules min- ing (MTSMARM), the patterns of abnormal traffic behavior are able to be obtained. In order to verify the above propos- als, experiments are conducted and, together with applica- tion of backbone networks (Internet2) Netflow data, show some positive results. Keywords Anomaly detection and identification · Weighted frequent subgraphs · Multivariate time series motif association rules mining 1 Introduction Large-scale IP network behavior analysis is important for network monitoring and network management. Network traffic which exhibits multi-timescale properties within tem- poral domain, however, is highly dynamic. large-scale IP W. He (�) · G. Hu · Y. Zhou Key Lab of Optical Fiber Sensing and Communication, University of Electronic Science and Technology of China (UESTC), 611731 Chengdu, P.R. China e-mail: weisonghe@uestc.edu.cn G. Hu (�) e-mail: hgm@uestc.edu.cn Y. Zhou e-mail: yjzhou@uestc.edu.cn network traffic anomalies which feature sudden erupting without preknown signs, often bring great damage to net- work equipments or computers of network in a short time. Therefore, one of the prepositions to ensure trustworthy networks is to detect and locate network traffic anomalies quickly and accurately, determine the reasons that cause them and make reasonable response to them in time. Anomaly detection, which refers to the topic of finding patterns in data that do not conform to expected behavior or can be defined as follows: “Given a set of n data points or objects and the number p of expected outliers, find the top p objects that are considerably dissimilar, exceptional, or inconsistent with respect to the remaining data” [1]. Net- work behavior anomaly detection refers to the issue of find- ing network behavior patterns in network data that do not conform to expected behavior. An anomalous traffic pattern in a computer network could mean that a hacked computer is sending out sensitive data to an unauthorized destination [4]. Based on the extent to which the labels are available, anomaly detection techniques can operate in one of the fol- lowing three modes: supervised anomaly detection, semi- supervised anomaly detection and unsupervised anomaly detection. Some anomaly detection techniques used in net- work intrusion detection systems are listed as follows: Sta- tistical Profiling using Histograms [5–19], Parametric Sta- tistical Modeling [20–23], Non-parametric Statistical Mod- eling [24], Bayesian Networks [25–28], Neural Networks [13–15, 29–34], Support Vector Machines [35], Rule-Based Systems [36–43], Clustering Based Method [35, 43, 45, 46], Nearest Neighbor based Method [35, 44, 47], Spectral and Graph Method [48, 49, 51, 53], Information Theoretic Method [53, 54]. On the foundation of entropy-based analysis of traffic feature distributions, this paper puts forward some new ideas which go as follows. Firstly, two kinds of traffic feature dis- 2 W. He et al. tributions are selected: one is address and port distributions, the other is protocol and octets distributions. The two kinds of distributions complement each other and provide differ- ent insight into the underlying structure. Secondly, aimed at detecting of large-scale IP network behavior anomalies more accurately, in this paper multivariate time series are represented by graphs which are produced by a given pair of graphs using Cartesian Product, and a substructure-based network behavior anomaly detection algorithm, named as WFS, was proposed to detect the anomalies of a large-scale IP network flow data. The paper is organized as follows. We review related work in Sect. 2. Then, in Sect. 3, we describe how to rep- resent the real large-scale IP network flow data with graph and propose WFS algorithm. In Sect. 4 we identify abnormal patterns using MTSMARM. In Sect. 5 experiments are con- ducted to evaluate our proposed method. Finally, we con- clude in Sect. 6. 2 Related work There is considerable interest in using entropy-based analy- sis of traffic feature distributions for anomaly detection [49, 50]. Entropy-based metrics are tempting since they provide more fine-grained insights into traffic structure than tradi- tional traffic volume analysis.While previous work has illu- minated the benefits of using the entropy of different traffic distributions in isolation to detect anomalies, there has been little effort in comprehensively understanding the detection power provided by entropy-based analysis of multiple traf- fic distribution used in connection with each other. Few re- searchers focus on the detection power which is affected by on choice of feature. Choosing traffic features distributions that complement one another will benefit anomaly detection capabilities greatly. The detection of anomalies in multivariate time series is more challenging due to that establishing a clear definition of an anomaly is difficult. Analogous to univariate time se- ries, some anomalies may correspond to abnormally high (or low) values or unusual subsequences (discord [59]) in one or more time series. Others may correspond to unexpected changes in the relationships among a set of variables. Recently there has been an urge towards analyzing data using graph theory methods. Not to be confused with the mechanisms for analyzing spatial data, graph-based data mining methods are to analyze data that can be represented as a graph (i.e., vertices and edges). Yet, while there has been much written as it belongs to graph-based data mining [60], very little research has been good at the area of graph- based anomaly detection. In 2003 [53] used the SUBDUE application to consider the problem of anomaly detection from both the anomalous substructure and anomalous sub- graph perspective. They were able to provide measurements of anomalous behavior as they are applied to graphs from two different views. Anomalous substructure detection dealt with the unusual substructures that were discovered in an en- tire graph. In order to discriminate an anomalous substruc- ture from the other substructures, they proposed a simple measurement whereby the value associated with a substruc- ture indicated a degree of anomaly. They also introduced the idea of anomalous subgraph detection which dealt with how anomalous a subgraph (i.e., a substructure that is part of a larger graph) was to other subgraphs. The idea was that subgraphs that contained many ordinary substructures were generally less anomalous than subgraphs that contained few ordinary substructures. In addition, they also investigate the idea of conditional entropy and data regularity using net- work intrusion data as well as some artificially created data. [64] utilized rarity measurements to the discovery of uncom- mon links within a graph. Using various metrics to define the commonality of paths between nodes, the user was able to confirm whether a path between two nodes were interest- ing or not, without having any preconceived notions of sig- nificant patterns. One of the disadvantages of this approach was that while it was domain independent, it assumed that the user was querying the system to discover interesting pat- terns regarding certain nodes. In other words, the uncommon patterns had to originate or terminate from a user-defined node. The AutoPart system presented a non-parametric ap- proach to finding outliers in graph-based data [63]. Part of Chakrabarti’s approach was to detect outliers by analyzing how edges that were removed from the whole structure af- fected the minimum descriptive length (MDL) of the graph. Representing the graph as an adjacency matrix, and using a compression technique to encode node groupings of the graph, he searched the groups that cut down the compres- sion cost as much as possible. Nodes were put in groups based on their entropy. In 2005, the concept of entropy was also used by [61] in their analysis of a real-world data set: the famous Enron scandal. They used “event based graph entropy” to determine the place of the most interesting peo- ple in an Enron e-mail data set. Using a measure analogous to what [53] had proposed, they supposed that the important nodes (or people) were the ones who had the biggest effect on the entropy of the graph when they were removed. Thus, the most interesting node was the one that brought about the maximum change to the graph’s entropy. However, they ne- glected the relations between nodes which provided more information about frequent subgraph. In 2005, by using just bipartite graphs [52] presented a model for scoring the nor- mality of nodes as they connect to the other nodes. Next, by using an adjacency matrix they gave a “relevance score” such that every node Vi had a relevance score to every node Vj , whereby the higher the score the more related the two nodes. The idea was that the nodes with the lower normality score to node Vi were the more anomalous ones to that node. Large-scale IP network behavior anomaly detection and identification using substructure-based approach 3 Table 1 The flow record contents Basic information Related to routing srcaddr, dstaddr, srcport, dstport nexthop dpkts, doctets src_as, dst_as first, last src_mask, dst_mask tos, tcp_flags The two drawbacks with this approach were that it only dealt with bipartite graphs and it only found anomalous nodes, rather than what could be anomalous substructures. In [62], they also went after anomalous links, this time via a statis- tical approach. By using a Katz measurement they used the link structure to statistically predict the likelihood of a link. While it worked on a small dataset of author-paper pairs, their single measurement just analyzed the links in a graph. 3 Anomaly detection using substructure-based approach The graph representation of large-scale IP network flow data can be divided into three steps. Firstly, large-scale IP net- work flow data are compressed using Shannon entropy; Sec- ondly, the entropy time series (a time series is composed of entropy) is normalized, and the time series is converted into an discrete symbolic sequence; Finally, multiple dimen- sional data at each time point are represented by a graph. 3.1 Netflow, entropy and anomaly detection metrics 3.1.1 Netflow With Netflow, applications on the network are able to be monitored, and malicious traffic informations are able to be identified. The Netflow, which is a passive monitoring tool includes: statistics about groups of related packets (e.g., same TCP/IP headers and close in time); records header in- formation, counts, and time. As can be seen from Table 1, the flow record contents can be divided into two parts: one is the basic information about the flow; the other is informa- tion related to routing. The basic information about the flow goes as follows: the first information is source and destina- tion, IP address and port; the second one is packet and byte counts; the third one is start and end times; the fourth one is ToS and TCP flags. The informations related to routing are listed as follows: the first next-hop IP address; source and destination AS; input and output. 3.1.2 Shannon entropy and anomaly detection metrics Entropy denoted by H , is a metric that captures the degree of dispersal or concentration of a distribution and is a measure Table 2 Description of six anomaly detection metrics at flow level Metrics type Description H(srcPort) Entropy of source port distribution H(dstPort) Entropy of destination port distribution H(srcIP) Entropy of source IP address distribution H(dstIP) Entropy of destination IP address distribution H(octets) Entropy of octets distribution H(prot) Entropy of protocol type distribution of the uncertainty of a random variable [2, 3, 49]. A wide va- riety of anomalies will impact on the distribution of one of the discussed IP features. Let X be a discrete random vari- able with alphabet X representing the distribution of val- ues which a particular network traffic feature can take, then probability mass function is p(x) = Pr{X = x}, x ∈ X . The entropy H(X) of a discrete random variable X is defined by H(X) = − ∑ x∈X p(x) logp(x) (1) where p(x) is the probability of event x ∈ X occurring. For example, the probability of seeing IP 129.173.192.0 is de- fined to be the number of packets using IP 129.173.192.0 divided by the total number of packets in the given time in- terval. In this paper, entropy can be applied to obtain a com- pressed representation of the large-scale IP network flow data that is much smaller in volume, yet closely maintain the integrity of the original data, which is called lossless distribution information reduction. Anomaly detection met- rics refer to metrics which are used for anomaly detection. Network anomaly detection can be operated in three differ- ent granularity such as packet level, flow level and network- wide (OD traffic) level. Since flow level analysis is a good compromise for traffic analysis based on network-wide and packet in its performance and accuracy, the analysis method of this paper is based on features distribution of Netflow. The empirical evaluation of [50] shown that port and ad- dress distribution are highly correlated. Besides port and ad- dress, in order to explore the underlying traffic structure, other features are chosen in this paper. The six flow level metrics used in this paper are as follows: source address (sometimes called source IP and denoted srcIP), destination address (or destination IP, denoted dstIP), source port (src- Port), destination port (dstPort), octets and type of protocol, which are shown in Table 2. 3.2 Symbolic representation of multivariate time series We use min-max normalization to map the entropy time se- ries in the range [0,1] by computing 4 W. He et al. Fig. 1 The representation of graphs time series h′i = hi − hmin hmax − hmin (2) where hmin and hmax are the minimum and maximum values of entropy time series. Then, we divide the normalized en- tropy series into 2n (n ≥ 2) parts equally, with 2n alphabets denoting the value of 2n parts respectively, and symbolic sequence is obtained. For example, by using four alphabets {A,B,C,D} denote the value of four parts respectively, and symbolic sequence si is obtained. si = ⎧ ⎪⎪⎪⎨ ⎪⎪⎪⎩ A, h′i ≤ 0.25 B, 0.25 < h′i ≤ 0.5 C, 0.5 < h′i ≤ 0.75 D, else (3) 3.3 Graph representation of multivariate time series Consider a time series H = h1h2 . . . hT , which is an or- dered sequence of measurements taken for a real-valued variable H at timestamps 1,2, . . . , T . A multivariate time series D = {Hi}6i=1 is a collection of time series that corre- sponds to measurements for six real-valued variables span- ning the same time interval. In order to represent the multivariate time series D, a new undirected graph, G = (V ,E), V = {v1, v2, . . . , vn}, E = {e1, e2, . . . , em} ⊆ V ×V which is derived from a given pair of graphs TN and K6 using Cartesian Product TN × K6, is made up. The representation of graphs time series is shown in Fig. 1. Each vertex in the graph denotes a data point of a Netflow feature and each vertex can be assigned a value within {A,B,C,D}. Edge denotes the relation be- tween two network anomaly detection metrics. As multi- ple network anomaly metrics distribution may affect each other, network anomaly metrics distribution time series are represented with complete graph to describe the relations of all network anomaly detection metrics. V (g) denotes the vertex set of graph g and E(g) denotes the edge set. Given a sequence GTs of n graphs {GT1 ,GT2, . . . ,GTn} with GTn = (VTna,ETnb), 1 ≤ a ≤ 6,1 ≤ b ≤ 15, GTn denotes the graph at time Tn, htc denotes the value for cth vertex Vc in Gt , 1 ≤ t ≤ n,1 ≤ c ≤ 6. ETnp denotes the pth edge of GTn at time Tn. We define GTs to be a graphs time series for all 1 ≤ Ts ≤ n, and define Supt (i, j, . . .) as the numbers of Large-scale IP network behavior anomaly detection and identification using substructure-based approach 5 graphs in GTs where g (consists of k vertexes) is a subgraph. For example, Supt (i, j) ( i, j denote two different vertices, t denotes the time point of the graph) represents the support of 2-itemsets. A frequent subgraph is a graph whose support is no less than a minimum support threshold, min_sup. Ac- cording to different values of the nodes, the support of graph pattern of each time point is counted. The fewer the sup- ports are, the more anomalous the subgraphs. Covariance is a measure of how much two traffic features change to- gether. The corresponding edge weight is given according to the covariance value. Let w be a discrete random vari- able with alphabet W representing the weight coefficient of subgraph can take, W(i, j) denotes the edge weight of 2 ver- texes (i, j). Consider m×1 random vector x(ζ ) = [x1(ζ ), x2(ζ ), . . . , xm(ζ )]T , then the mathematical expectation μx is μx = [μ1,μ2, . . . ,μm]T (4) Every random variable have m sample, then sample ma- trix is M = [α1,α2, . . . ,αn]T = [β1,β2, . . . ,βm] (5) where βj (j = 1,2, . . . ,m) is the sample vector of j th ran- dom variable, αi (i = 1,2, . . . , n) is the vector composed of all the sample values of ith random variable. The covariance matrix of random variable x(ζ ),y(ζ ) can be defined as Cxy = E{[x(ζ ) − μx][y(ζ ) − μy]H } (6) The estimation covariance Cˆxy can be obtained based on the sample value: cˆxi ,yj = 1 n − 1 n∑ k=1 [( Mki − 1 n n∑ p=1 xpi ) × ( Mkj − 1 n n∑ q=1 yqj )] (7) The weight coefficient of subgraph can be computed as fol- lows: W(x1, x2) = exp(−cˆ1/2x1,x2) W(x1, x2, x3) = exp(−cˆ1/2x1,x2 − cˆ1/2x2,x3 − cˆ1/2x3,x1) W(x1, x2, x3, x4) = exp(−cˆ1/2x1,x2 − cˆ1/2x2,x3 − cˆ1/2x3,x4 − cˆ1/2x4,x1) W(x1, x2, x3, x4, x5) = exp(−cˆ1/2x1,x2 − cˆ1/2x2,x3 − cˆ1/2x3,x4 − cˆ1/2x4,x5 − cˆ1/2x5,x1) (8) where 1 ≤ x1, x2, x3, x4, x5 ≤ 4. θτ denotes the contribution factor (impact degree to anomaly)of subgraph G(i, j, . . .︸︷︷︸ τ ). For example, θ2 denotes the contribution factor of sub- graph G(i, j). According to F2(S,G) = Size(S) ∗ Instances(S,G), Size(S) is the number of vertices in S, Instances(S,G) is the frequency that S appears in the graph G, which is described in [53], small substructures appear frequently, large substructures appears infrequently. In general, 1 = θ2 > θ3 > θ4 > θ5 > 0. The abnormal index Pt is represented as in (9): Pt = log ⎛ ⎜⎜⎜⎜⎜⎜⎜⎝ [θ2 θ3 θ4 θ5] ⎡ ⎢⎢⎢⎢⎢⎢⎢⎣ ∑ 1≤x1

本文档为【network behavior anomaly detection（网络行为异常检测）】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

历史搜索

清空历史搜索

network behavior anomaly detection（网络行为异常检测）

热门搜索

历史搜索