为了正常的体验网站,请在浏览器设置里面开启Javascript功能!

IEEE Trans PAMI2012 Density-based multifeature background subtraction with support vector machine

2013-03-12 7页 pdf 1MB 14阅读

用户头像

is_296914

暂无简介

举报
IEEE Trans PAMI2012 Density-based multifeature background subtraction with support vector machine Short Papers___________________________________________________________________________________________________ Density-Based Multifeature Background Subtraction with Support Vector Machine Bohyung Han, Member, IEEE, and Larry S. Davis, Fellow, IEEE Abstract—Back...
IEEE Trans PAMI2012 Density-based multifeature background subtraction with support vector machine
Short Papers___________________________________________________________________________________________________ Density-Based Multifeature Background Subtraction with Support Vector Machine Bohyung Han, Member, IEEE, and Larry S. Davis, Fellow, IEEE Abstract—Background modeling and subtraction is a natural technique for object detection in videos captured by a static camera, and also a critical preprocessing step in various high-level computer vision applications. However, there have not been many studies concerning useful features and binary segmentation algorithms for this problem. We propose a pixelwise background modeling and subtraction technique using multiple features, where generative and discriminative techniques are combined for classification. In our algorithm, color, gradient, and Haar-like features are integrated to handle spatio-temporal variations for each pixel. A pixelwise generative background model is obtained for each feature efficiently and effectively by Kernel Density Approximation (KDA). Background subtraction is performed in a discriminative manner using a Support Vector Machine (SVM) over background likelihood vectors for a set of features. The proposed algorithm is robust to shadow, illumination changes, spatial variations of background. We compare the performance of the algorithm with other density- based methods using several different feature combinations and modeling techniques, both quantitatively and qualitatively. Index Terms—Background modeling and subtraction, Haar-like features, support vector machine, kernel density approximation. Ç 1 INTRODUCTION THE identification of regions of interest is typically the first step in many computer vision applications, including event detection, visual surveillance, and robotics. A general object detection algorithm may be desirable, but it is extremely difficult to properly handle unknown objects or objects with significant variations in color, shape, and texture. Therefore, many practical computer vision systems assume a fixed camera environment, which makes the object detection process much more straightforward; a back- ground model is trained with data obtained from empty scenes and foreground regions are identified using the dissimilarity between the trained model and new observations. This procedure is called background subtraction. Various background modeling and subtraction algorithms have been proposed [1], [2], [3], [4], [5] which are mostly focused on modeling methodologies, but potential visual features for effective modeling have received relatively little attention. The study of new features for background modeling may overcome or reduce the limitations of typically used features, and the combination of several heterogeneous features can improve performance, espe- cially when they are complementary and uncorrelated. There have been several studies for using texture for background modeling to handle spatial variations in the scenes; they employ filter responses, whose computation is typically very costly. Instead of complex filters, we select efficient Haar-like features [6] and gradient features to alleviate potential errors in background subtraction caused by shadow, illumination changes, and spatial and structural variations. Model-based approaches involving probability density function are common in background modeling and subtraction, and we employ Kernel Density Approximation (KDA) [3], [7], where a density function is represented with a compact weighted sum of Gaussians whose number, weights, means, and covariances are determined automatically based on mean-shift mode-finding algorithm. In our framework, each visual feature is modeled by KDA independently and every density function is 1D. By utilizing the properties of the 1D mean-shift mode-finding procedure, the KDA can be implemented efficiently because we need to compute the convergence locations for only a small subset of data. When the background is modeled with probability density functions, the probabilities of foreground and background pixels should be discriminative, but it is not always true. Specifically, the background probabilities between features may be inconsistent due to illumination changes, shadow, and foreground objects similar in features to the background. Also, some features are highly correlated, i.e., RGB color features. So, we employ a Support Vector Machine (SVM) for nonlinear classification, which mitigates the inconsistency and the correlation problem among features. The final classification between foreground and background is based on the outputs of the SVM. There are three important aspects of our algorithm—integration of multiple features, efficient 1D density estimation by KDA, and foreground/background classification by SVM. These are coordi- nated tightly to improve background subtraction performance. An earlier version of this research appeared in [8]; the current paper includes more comprehensive analysis of the feature sets and additional experiments. 2 PREVIOUS WORK The main objective of background subtraction is to obtain an effective and efficient background model for foreground object detection. In the early years, simple statistics, such as frame differences and median filtering, were used to detect foreground objects. Some techniques utilized a combination of local statistics [9] or vector quantization [10] based on intensity or color at each pixel. More advanced background modeling methods are density- based, where the background model for each pixel is defined by a probability density function based on the visual features observed at the pixel during a training period. Wren et al. [11] modeled the background in YUV color space using a Gaussian distribution for each pixel. However, this method cannot handle multimodal density functions, so it is not robust in dynamic environments. A mixture of Gaussians is another popular density-based method which is designed for dealing with multiple backgrounds. Three Gaussian components representing the road, shadow, and vehicle are employed to model the background in traffic scenes in [12]. An adaptive Gaussian mixture model is proposed in [2] and [13], where a maximum of K Gaussian components for each pixel are allowed but the number of Gaussians is determined dynami- cally. Also, variants of incremental EM algorithms have been employed to deal with real-time adaptation constraints of back- ground modeling [14], [15]. However, a major challenge in the mixture model is the absence or weakness of strategies to determine the number of components; it is also generally difficult to add or remove components to/from the mixture [16]. Recently, more elaborate and recursive update techniques are discussed in [4] and [5]. However, none of the Gaussian mixture models have any principled way to determine the number of Gaussians. Therefore, most real-time applications rely on models with a fixed IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 1017 . B. Han is with the Department of Computer Science and Engineering, POSTECH, Pohang 790-784, Korea. E-mail: bhhan@postech.ac.kr. . L.S. Davis is with the Department of Computer Science, University of Maryland, College Park, MD 20742. E-mail: lsd@cs.umd.edu. Manuscript received 28 Sept. 2009; revised 19 Dec. 2010; accepted 27 Aug. 2011; published online 8 Dec. 2011. Recommended for acceptance by G.D. Hager. For information on obtaining reprints of this article, please send e-mail to: tpami@computer.org, and reference IEEECS Log Number TPAMI-2009-09-0649. Digital Object Identifier no. 10.1109/TPAMI.2011.243. 0162-8828/12/$31.00 � 2012 IEEE Published by the IEEE Computer Society number of components or apply ad hoc strategies to adapt the number of mixtures over time [2], [4], [5], [13], [17]. Kernel density estimation is a nonparametric density estimation technique that has been successfully applied to background subtraction [1], [18]. Although it is a powerful representation for general density functions, it requires many samples for accurate estimation of the underlying density functions and is computa- tionally expensive, so it is not appropriate for real-time applica- tions, especially when high-dimensional features are involved. Most background subtraction algorithms are based on pixel- wise processing, but multilayer approaches are also introduced in [19] and [20], where background models are constructed at the pixel, region, and frame levels and information from each layer is combined for discriminating foreground and background. In [21], several modalities based on different features and algorithms are integrated and a Markov Random Field (MRF) is employed for the inference procedure. The co-occurrence of visual features within neighborhood pixels is used for robust background subtraction in dynamic scenes in [22]. Some research on background subtraction has focused more on features than the algorithm itself. Various visual features may be used to model backgrounds, including intensity, color, gradient, motion, texture, and other general filter responses. Color and intensity are probably the most popular features for background modeling, but several attempts have been made to integrate other features to overcome their limitations. There are a few algorithms based on motion cue [18], [23]. Texture and gradients have also been successfully integrated for background modeling [24], [25] since they are relatively invariant to local variations and illumina- tion changes. In [26], spectral, spatial, and temporal features are combined, and background/foreground classification is per- formed based on the statistics of the most significant and frequent features. Recently, a feature selection technique was proposed for background subtraction [27]. 3 BACKGROUND MODELING AND SUBTRACTION ALGORITHM This section describes our background modeling and subtraction method based on the 1D KDA using multiple features. KDA is a flexible and compact density estimation technique [7], and we present a faster method to implement KDA for 1D data. For background subtraction, we employ the SVM, which takes a vector of probabilities obtained from multiple density functions as an input. 3.1 Multiple Feature Combination The most popular features for background modeling and subtrac- tion are probably pixelwise color (or intensity) since they are directly available from images and reasonably discriminative. Although it is natural to monitor color variations at each pixel for background modeling, they have several significant limitations as follows: . They are not invariant to illumination changes and shadows. . Multidimensional color features are typically correlated and joint probability modeling may not be advantageous in practice. . They rely on local information only and cannot handle structural variations in neighborhoods. We integrate color, gradient, and Haar-like features together to alleviate the disadvantages of pixelwise color modeling. The gradient features are more robust to illumination variations than color or intensity features and are able to model local statistics effectively. So, gradient features are occasionally used in back- ground modeling problems [26], [27]. The strength of Haar-like features lies in their simplicity and the ability to capture neighborhood information. Each Haar-like feature is weak by itself, but the collection of weak features has significant classifica- tion power [6], [28]. The integration of these features is expected to improve the accuracy of background subtraction. We have 11 features altogether, RGB color, two gradient features (horizontal and vertical), and six Haar-like features. The Haar-like features employed in our implementation are illustrated in Fig. 1. The Haar-like features are extracted from 9� 9 rectangular regions at each location in the image, while the gradient features are extracted with 3� 3 Sobel operators. The fourth and fifth Haar- like features are similar to the gradient features, but differ in filter design, especially scale. 3.2 Background Modeling by KDA The background probability of each pixel for each feature is modeled with a Gaussian mixture density function.1 There are various methods to implement this idea, and we adopt KDA, where the density function for each pixel is represented with a compact and flexible mixture of Gaussians. The KDA is a density approximation technique based on mixture models, where mode locations (local maxima) are detected automatically by the mean- shift algorithm and a single Gaussian component is assigned to each detected mode. The covariance for each Gaussian is computed by curvature fitting around the associated mode. More details about KDA are found in [7], and we summarize how to build a density function by KDA for 1D case. Suppose that training data for a sequence are composed of n frames and xF;i (i ¼ 1; . . . ; n) is the ith value for feature F at a certain location. Note that the index for pixel location is omitted for simplicity. For each feature, we first construct a 1D density function at each pixel by kernel density estimation based on Gaussian kernel as follows: f^F ðxÞ ¼ 1ffiffiffiffiffiffi 2� p Xn i¼1 �F;i �F;i exp �ðx� xF;iÞ 2 2�2F;i ! ; ð1Þ where �F;i and �F;i are the bandwidth and weight of the ith kernel, respectively. As described above, the KDA finds local maxima in the underlying density function (1), and a mode-based representation of the density is obtained by estimating all the parameters for a compact Gaussian mixture. The original density function is simplified by KDA as ~fF ðxÞ ¼ 1ffiffiffiffiffiffi 2� p XmF i¼1 ~�F;i ~�F;i exp �ðx� ~xF;iÞ 2 2~�2F;i ! ; ð2Þ where ~xF;i is a convergence location by the mean-shift mode finding procedure, and ~�F;i and ~�F;i are the estimated standard deviation and weight for the Gaussian assigned to each mode, respectively. Note that the number of components in (2), mF , is generally much smaller than n. One remaining issue is bandwidth selection—initialization of �F;i—to create the original density function in (1). Although there is no optimal solution for bandwidth selection for the nonpara- metric density estimation, we initialize the kernel bandwidth based on the median absolute deviation over the temporal neighborhood of the corresponding pixels, which is almost 1018 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 34, NO. 5, MAY 2012 Fig. 1. Haar-like features for our background modeling. 1. Histogram-based modeling is appropriate for 1D data. However, the construction of a histogram at each pixel requires significant memory and the quantization is not straightforward due to the wide range of Haar-like feature values. Also, a histogram is typically less stable than continuous density functions when only a small number of samples are available. identical to the method proposed in [1]. Specifically, the initial kernel bandwidth of the ith sample for feature F is given by �F;i ¼ max �min;medt¼i�tw;...;iþtw jxF;t � xF;t�1j � � ; ð3Þ where tw is the temporal window size and �min is the predefined minimum kernel bandwidth to avoid too narrow a kernel. The motivation for this strategy is that multiple objects may be projected onto the same pixel, but feature values from the same object would be observed for a period of time; the absolute difference between two consecutive frames, therefore, does not jump too frequently. Although KDA handles multimodal density functions for each feature, it is still not sufficient to handle long-term background variations. We must update background models periodically or incrementally, which is done by Sequential Kernel Density Approximation (SKDA) [7]. The density function at time step tþ 1 is obtained by the weighted average of the density function at the previous time step and a new observation, which is given by f^ tþ1F ðxÞ ¼ ð1� rÞf^ tF ðxÞ þ r~�tþ1F;iffiffiffiffiffiffi 2� p ~�tþ1F;i exp � � x� ~xtþ1F;i �2 2 � ~�tþ1F;i �2 ! ; ð4Þ where r is a learning rate (forgetting factor). To prevent the number of components from increasing indefinitely by this procedure, KDA is applied at each time step. The detailed technique and analysis are described in [7]. Since we are dealing with only 1D vectors, the sequential modeling is also much simpler and similar to the one introduced in Section 3.3; it is sufficient to compute the new convergence points of the mode locations immediately before and after the new data in the density function at each time step.2 The computation time of SKDA depends principally on the number of iterations for the data points in (4), and can vary at each time step. 3.3 Optimization in One Dimension We are only interested in 1D density functions, and the convergence of each sample point can be obtained much faster than the general case. Given 1D samples, xi and xj (i; j ¼ 1; . . . ; n), the density function created by kernel density estimation has the following properties: ðxi � xj � x^iÞ _ ðxi � xj � x^iÞ ) x^i ¼ x^j; ð5Þ where x^i and x^j are the convergence location of xi and xj, respectively. In other words, all the samples located between a sample and its associated mode converge to the same mode location. So, every convergence location in the underlying density function can be found without running the actual mode-finding procedure for all the sample points, which is the most expensive part in the computation of KDA. This section describes a simple method to find all the convergence points by a single linear scan of samples using the above properties, efficiently. We sort the sample points in ascending order, and start performing mean-shift mode finding from the smallest sample. When the current sample moves in the gradient ascent direction by the mean-shift algorithm in the underlying density function and passes another sample’s location during the iterative procedure, we note that the convergence point of the current sample must be the same as the convergence location of the sample just passed, terminate the current sample’s mean-shift process, and move on to the next smallest sample, where we begin the mean-shift process again.3 If a mode is found during the mean-shift iterations, its location is stored and the next sample is considered. After finishing the scan of all samples, each sample is associated with a mode and the mode-finding procedure is complete. This strategy is simple, but improves the speed of mean-shift mode-finding significantly, especially when many samples are involved. Fig. 2 illustrates the impact of 1D optimization for the mode-finding procedure, where a huge difference in computation time between two algorithms is evident. 3.4 Foreground and Background Classification After background modeling, each pixel is associated with k 1D Gaussian mixtures, where k is the number of features integrated. Background/foreground classification for a new frame is per- formed using these distributions. The background probability of a feature value is computed by (2), and k probability values are obtained from each pixel, which are represented by a k-dimen- sional vector. Such k-dimensional vectors are collected from annotated foreground and background pixels, and we denote them by yj (j ¼ 1; . . . ; N), where N is the number of data points. In most density-based background subtraction algorithms, the probabilities associated with each pixel are combined in a straightforward way, either by computing the average probability or by voting for the classification. However, such simple methods may not work well under many real-world situations due to feature dependency and nonlinearity. For example, pixels in shadow may have a low-background probability in color modeling unless shadows are explicitly modeled as transformations of color variables, but high-background probability in texture modeling. Also, the foreground color of a pixel
/
本文档为【IEEE Trans PAMI2012 Density-based multifeature background subtraction with support vector machine】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索