压缩感知下载_在线阅读_18

首页 > 压缩感知

is_785371

暂无简介

压缩感知 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract—Suppose is an unknown vector in (a digital image or signal); we plan to measure general linear functionals of and then reconstruct. ...

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 1289 Compressed Sensing David L. Donoho, Member, IEEE Abstract—Suppose is an unknown vector in (a digital image or signal); we plan to measure general linear functionals of and then reconstruct. If is known to be compressible by transform coding with a known transform, and we recon- struct via the nonlinear procedure defined here, the number of measurements can be dramatically smaller than the size . Thus, certain natural classes of images with pixels need only = ( 1 4 log 5 2 ( )) nonadaptive nonpixel samples for faithful recovery, as opposed to the usual pixel samples. More specifically, suppose has a sparse representation in some orthonormal basis (e.g., wavelet, Fourier) or tight frame (e.g., curvelet, Gabor)—so the coefficients belong to an ball for 0 1. The most important coefficients in that expansion allow reconstruction with 2 error ( 1 2 1 ). It is possible to design = ( log( )) nonadaptive measurements allowing reconstruction with accuracy comparable to that attain- able with direct knowledge of the most important coefficients. Moreover, a good approximation to those important coeffi- cients is extracted from the measurements by solving a linear program—Basis Pursuit in signal processing. The nonadaptive measurements have the character of “random” linear combi- nations of basis/frame elements. Our results use the notions of optimal recovery, of -widths, and information-based complexity. We estimate the Gel’fand -widths of balls in high-dimensional Euclidean space in the case 0 1, and give a criterion identifying near-optimal subspaces for Gel’fand -widths. We show that “most” subspaces are near-optimal, and show that convex optimization (Basis Pursuit) is a near-optimal way to extract information derived from these near-optimal subspaces. Index Terms—Adaptive sampling, almost-spherical sections of Banach spaces, Basis Pursuit, eigenvalues of random matrices, Gel’fand -widths, information-based complexity, integrated sensing and processing, minimum 1 -norm decomposition, op- timal recovery, Quotient-of-a-Subspace theorem, sparse solution of linear equations. I. INTRODUCTION AS our modern technology-driven civilization acquires andexploits ever-increasing amounts of data, “everyone” now knows that most of the data we acquire “can be thrown away” with almost no perceptual loss—witness the broad success of lossy compression formats for sounds, images, and specialized technical data. The phenomenon of ubiquitous compressibility raises very natural questions: why go to so much effort to ac- quire all the data when most of what we get will be thrown away? Can we not just directly measure the part that will not end up being thrown away? In this paper, we design compressed data acquisition proto- cols which perform as if it were possible to directly acquire just Manuscript received September 18, 2004; revised December 15, 2005. The author is with the Department of Statistics, Stanford University, Stanford, CA 94305 USA (e-mail: donoho@stanford.edu). Communicated by A. Høst-Madsen, Associate Editor for Detection and Estimation. Digital Object Identifier 10.1109/TIT.2006.871582 the important information about the signals/images—in effect, not acquiring that part of the data that would eventually just be “thrown away” by lossy compression. Moreover, the protocols are nonadaptive and parallelizable; they do not require knowl- edge of the signal/image to be acquired in advance—other than knowledge that the data will be compressible—and do not at- tempt any “understanding” of the underlying object to guide an active or adaptive sensing strategy. The measurements made in the compressed sensing protocol are holographic—thus, not simple pixel samples—and must be processed nonlinearly. In specific applications, this principle might enable dra- matically reduced measurement time, dramatically reduced sampling rates, or reduced use of analog-to-digital converter resources. A. Transform Compression Background Our treatment is abstract and general, but depends on one spe- cific assumption which is known to hold in many settings of signal and image processing: the principle of transform sparsity. We suppose that the object of interest is a vector , which can be a signal or image with samples or pixels, and that there is an orthonormal basis for which can be, for example, an orthonormal wavelet basis, a Fourier basis, or a local Fourier basis, depending on the application. (As ex- plained later, the extension to tight frames such as curvelet or Gabor frames comes for free.) The object has transform coeffi- cients , and these are assumed sparse in the sense that, for some and for some (I.1) Such constraints are actually obeyed on natural classes of sig- nals and images; this is the primary reason for the success of standard compression tools based on transform coding [1]. To fix ideas, we mention two simple examples of constraint. • Bounded Variation model for images. Here image bright- ness is viewed as an underlying function on the unit square , which obeys (essentially) The digital data of interest consist of pixel sam- ples of produced by averaging over pixels. We take a wavelet point of view; the data are seen as a su- perposition of contributions from various scales. Let denote the component of the data at scale , and let denote the orthonormal basis of wavelets at scale , con- taining elements. The corresponding coefficients obey . 0018-9448/$20.00 © 2006 IEEE 1290 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 • Bump Algebra model for spectra. Here a spectrum (e.g., mass spectrum or magnetic resonance spectrum) is modeled as digital samples of an underlying function on the real line which is a superposition of so-called spectral lines of varying positions, amplitudes, and linewidths. Formally Here the parameters are line locations, are ampli- tudes/polarities, and are linewidths, and represents a lineshape, for example the Gaussian, although other pro- files could be considered. We assume the constraint where , which in applications represents an en- ergy or total mass constraint. Again we take a wavelet viewpoint, this time specifically using smooth wavelets. The data can be represented as a superposition of con- tributions from various scales. Let denote the com- ponent of the spectrum at scale , and let denote the orthonormal basis of wavelets at scale , containing elements. The corresponding coefficients again obey , [2]. While in these two examples, the constraint appeared, other constraints with can appear naturally as well; see below. For some readers, the use of norms with may seem initially strange; it is now well understood that the norms with such small are natural mathematical measures of sparsity [3], [4]. As decreases below , more and more sparsity is being required. Also, from this viewpoint, an constraint based on requires no sparsity at all. Note that in each of these examples, we also allowed for sep- arating the object of interest into subbands, each one of which obeys an constraint. In practice, in the following we stick with the view that the object of interest is a coefficient vector obeying the constraint (I.1), which may mean, from an applica- tion viewpoint, that our methods correspond to treating various subbands separately, as in these examples. The key implication of the constraint is sparsity of the transform coefficients. Indeed, we have trivially that, if de- notes the vector with everything except the largest coeffi- cients set to (I.2) for , with a constant depending only on . Thus, for example, to approximate with error , we need to keep only the biggest terms in . B. Optimal Recovery/Information-Based Complexity Background Our question now becomes: if is an unknown signal whose transform coefficient vector obeys (I.1), can we make a reduced number of measurements which will allow faithful reconstruction of . Such questions have been discussed (for other types of assumptions about ) under the names of Optimal Recovery [5] and Information-Based Complexity [6]; we now adopt their viewpoint, and partially adopt their nota- tion, without making a special effort to be really orthodox. We use “OR/IBC” as a generic label for work taking place in those fields, admittedly being less than encyclopedic about various scholarly contributions. We have a class of possible objects of interest, and are interested in designing an information operator that samples pieces of information about , and an algorithm that offers an approximate reconstruction of . Here the information operator takes the form where the are sampling kernels, not necessarily sampling pixels or other simple features of the signal; however, they are nonadaptive, i.e., fixed independently of . The algorithm is an unspecified, possibly nonlinear reconstruction operator. We are interested in the error of reconstruction and in the behavior of optimal information and optimal algorithms. Hence, we consider the minimax error as a standard of comparison So here, all possible methods of nonadaptive linear sampling are allowed, and all possible methods of reconstruction are allowed. In our application, the class of objects of interest is the set of objects where obeys (I.1) for a given and . Denote then Our goal is to evaluate and to have practical schemes which come close to attaining it. C. Four Surprises Here is the main quantitative phenomenon of interest for this paper. Theorem 1: Let be a sequence of problem sizes with , , and , , . Then for there is so that (I.3) We find this surprising in four ways. First, compare (I.3) with (I.2). We see that the forms are similar, under the calibration . In words: the quality of approximation to which could be gotten by using the biggest transform coef- ficients can be gotten by using the pieces of nonadaptive information provided by . The surprise is that we would not know in advance which transform coefficients are likely to be the important ones in this approximation, yet the optimal information operator is nonadaptive, depending at most on the class and not on the specific object. In some sense this nonadaptive information is just as powerful as knowing the best transform coefficients. This seems even more surprising when we note that for ob- jects , the transform representation is the optimal one: no other representation can do as well at characterizing by a few coefficients [3], [7]. Surely then, one imagines, the sampling kernels underlying the optimal information DONOHO: COMPRESSED SENSING 1291 operator must be simply measuring individual transform coef- ficients? Actually, no: the information operator is measuring very complex “holographic” functionals which in some sense mix together all the coefficients in a big soup. Compare (VI.1) below. (Holography is a process where a three–dimensional (3-D) image generates by interferometry a two–dimensional (2-D) transform. Each value in the 2-D transform domain is influenced by each part of the whole 3-D object. The 3-D object can later be reconstructed by interferometry from all or even a part of the 2-D transform domain. Leaving aside the specifics of 2-D/3-D and the process of interferometry, we perceive an analogy here, in which an object is transformed to a compressed domain, and each compressed domain component is a combination of all parts of the original object.) Another surprise is that, if we enlarged our class of informa- tion operators to allow adaptive ones, e.g., operators in which certain measurements are made in response to earlier measure- ments, we could scarcely do better. Define the minimax error under adaptive information allowing adaptive operators where, for , each kernel is allowed to depend on the information gathered at previous stages . Formally setting we have the following. Theorem 2: For , there is so that for So adaptive information is of minimal help—despite the quite natural expectation that an adaptive method ought to be able iteratively somehow “localize” and then “close in” on the “big coefficients.” An additional surprise is that, in the already-interesting case , Theorems 1 and 2 are easily derivable from known results in OR/IBC and approximation theory! However, the derivations are indirect; so although they have what seem to the author as fairly important implications, very little seems known at present about good nonadaptive information operators or about concrete algorithms matched to them. Our goal in this paper is to give direct arguments which cover the case of highly compressible objects, to give di- rect information about near-optimal information operators and about concrete, computationally tractable algorithms for using this near-optimal information. D. Geometry and Widths From our viewpoint, the phenomenon described in The- orem 1 concerns the geometry of high-dimensional convex and nonconvex “balls.” To see the connection, note that the class is the image, under orthogonal transformation, of an ball. If this is convex and symmetric about the origin, as well as being orthosymmetric with respect to the axes provided by the wavelet basis; if , this is again symmetric about the origin and orthosymmetric, while not being convex, but still star-shaped. To develop this geometric viewpoint further, we consider two notions of -width; see [5]. Definition 1.1: The Gel’fand -width of with respect to the norm is defined as where the infimum is over -dimensional linear subspaces of , and denotes the orthocomplement of with respect to the standard Euclidean inner product. In words, we look for a subspace such that “trapping” in that subspace causes to be small. Our interest in Gel’fand -widths derives from an equivalence between optimal recovery for nonadaptive information and such -widths, well known in the case [5], and in the present setting extending as follows. Theorem 3: For and (I.4) (I.5) Thus the Gel’fand -widths either exactly or nearly equal the value of optimal information. Ultimately, the bracketing with constant will be for us just as good as equality, owing to the unspecified constant factors in (I.3). We will typically only be interested in near-optimal performance, i.e., in obtaining to within constant factors. It is relatively rare to see the Gel’fand -widths studied directly [8]; more commonly, one sees results about the Kolmogorov -widths. Definition 1.2: Let be a bounded set. The Kolmogorov -width of with respect the norm is defined as where the infimum is over -dimensional linear subspaces of . In words, measures the quality of approximation of possible by -dimensional subspaces . In the case , there is an important duality relationship between Kolmogorov widths and Gel’fand widths which allows us to infer properties of from published results on . To state it, let be defined in the obvious way, based on approximation in rather than norm. Also, for given and , let and be the standard dual indices , . Also, let denote the standard unit ball of . Then [8] (I.6) In particular 1292 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 52, NO. 4, APRIL 2006 The asymptotic properties of the left-hand side have been de- termined by Garnaev and Gluskin [9]. This follows major work by Kashin [10], who developed a slightly weaker version of this result in the course of determining the Kolmogorov -widths of Sobolev spaces. See the original papers, or Pinkus’s book [8] for more details. Theorem 4: (Kashin, Garnaev, and Gluskin (KGG)) For all and Theorem 1 now follows in the case by applying KGG with the duality formula (I.6) and the equivalence formula (I.4). The case of Theorem 1 does not allow use of duality and the whole range is approached differently in this paper. E. Mysteries … Because of the indirect manner by which the KGG result im- plies Theorem 1, we really do not learn much about the phenom- enon of interest in this way. The arguments of Kashin, Garnaev, and Gluskin show that there exist near-optimal -dimensional subspaces for the Kolmogorov widths; they arise as the null spaces of certain matrices with entries which are known to exist by counting the number of matrices lacking certain prop- erties, the total number of matrices with entries, and com- paring. The interpretability of this approach is limited. The implicitness of the information operator is matched by the abstractness of the reconstruction algorithm. Based on OR/IBC theory we know that the so-called central algorithm is optimal. This “algorithm” asks us to consider, for given information , the collection of all objects which could have given rise to the data Defining now the center of a set center the central algorithm is center and it obeys, when the information is optimal, see Section III below. This abstract viewpoint unfortunately does not translate into a practical approach (at least in the case of the , ). The set is a section of the ball , and finding the center of this section does not corre- spond to a standard tractable computational problem. Moreover, this assumes we know and , which would typically not be the case. F. Results Our paper develops two main types of results. • Near-Optimal Information. We directly consider the problem of near-optimal subspaces for Gel’fand -widths of , and introduce three structural conditions (CS1–CS3) on an -by- matrix which imply that its nullspace is near-optimal. We show that the vast majority of -subspaces of are near-optimal, and random sampling yields near-optimal information operators with overwhelmingly high probability. • Near-Optimal Algorithm. We study a simple nonlinear re- construction algorithm: simply minimize the norm of the coefficients subject to satisfying the measurements. This has been studied in the signal processing literature under the name Basis Pursuit; it can be computed by linear programming. We show that this method gives near-op- timal results for all . In short, we provide a large supply of near-optimal infor- mation operators and a near-optimal reconstruction procedure based on linear programming, which, perhaps unexpectedly, works even for the nonconvex case . For a taste of the type of result we obtain, consider a specific information/algorithm combination. • CS Information. Let be an matrix generated by randomly sampling the columns, with different columns independent and identically distributed (i.i.d.) random uniform on . With overwhelming probability for large , has properties CS1–CS3 discussed in detail in Section II-A below; assume we have achieved such a favorable draw. Let be the basis matrix with basis vector as the th column. The CS Information operator is the matrix . • -minimization. To reconstruct from CS Information, we solve the convex optimization problem subject to In words, we look for the object having coefficients with smallest norm that is consistent with the informa- tion . To evaluate the quality of an information operator , set To evaluate the quality of a combined algorithm/information pair , set Theorem 5: Let , be a sequence of problem sizes obeying , , ; and let be a corresponding sequence of operators deriving from CS matrices with underlying parameters and (see Section II below). Let . There exists so that is near-optimal: DONOHO: COMPRESSED SENSING 1293 for , . Moreover, the algorithm delivering the solution to is near-optimal: for , . Thus, for large , we have a simple description of near-op- timal information and a tractable near-optimal reconstruction algorithm. G. Potential Applications To see the potential implications, recall first the Bump Al- gebra model for spectra. In this context, our result says that, for a spectrometer based on the information operator in The- orem 5, it is really only necessary to take measurements to get an accurate reconstruction of such spectra, rather than the nominal measurements. Ho

本文档为【压缩感知】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

压缩感知

热门搜索

历史搜索