视频去抖动资料下载_在线阅读_9

is_577323

暂无简介

视频去抖动资料 To appear in the ACM SIGGRAPH conference proceedings Content-Preserving Warps for 3D Video Stabilization Feng Liu Michael Gleicher University of Wisconsin-Madison Hailin Jin Aseem Agarwala Adobe Systems, Inc. Abstract We describe a technique that transforms ...

To appear in the ACM SIGGRAPH conference proceedings Content-Preserving Warps for 3D Video Stabilization Feng Liu Michael Gleicher University of Wisconsin-Madison Hailin Jin Aseem Agarwala Adobe Systems, Inc. Abstract We describe a technique that transforms a video from a hand-held video camera so that it appears as if it were taken with a directed camera motion. Our method adjusts the video to appear as if it were taken from nearby viewpoints, allowing 3D camera movements to be simulated. By aiming only for perceptual plausibility, rather than accurate reconstruction, we are able to develop algorithms that can effectively recreate dynamic scenes from a single source video. Our technique first recovers the original 3D camera motion and a sparse set of 3D, static scene points using an off-the-shelf structure-from- motion system. Then, a desired camera path is computed either automatically (e.g., by fitting a linear or quadratic path) or inter- actively. Finally, our technique performs a least-squares optimiza- tion that computes a spatially-varying warp from each input video frame into an output frame. The warp is computed to both follow the sparse displacements suggested by the recovered 3D structure, and avoid deforming the content in the video frame. Our experiments on stabilizing challenging videos of dynamic scenes demonstrate the effectiveness of our technique. 1 Introduction While digital still photography has progressed to the point where most amateurs can easily capture high-quality images, the quality gap between professional and amateur-level video remains remark- ably wide. One of the biggest components of this gap is camera motion. Most camera motions in casual video are shot hand-held, yielding videos that are difficult to watch, even if video stabiliza- tion is used to remove high-frequency jitters. In contrast, some of the most striking camera movements in professional produc- tions are “tracking shots” [Kawin 1992], where cameras are moved along smooth, simple paths. Professionals achieve such motion with sophisticated equipment, such as cameras mounted on rails or steadicams, that are too cumbersome or expensive for amateurs. In this paper, we describe a technique that allows a user to trans- form their hand-held videos to have the appearance of an idealized camera motion, such as a tracking shot, as a post-processing step. Given a video sequence from a single video camera, our algorithm can simulate any camera motion that is reasonably close to the cap- tured one. We focus on creating canonical camera motions, such as linear or parabolic paths, because such paths have a striking effect and are difficult to create without extensive equipment. Our method can also perform stabilization using low-pass filtering of the origi- nal camera motion to give the appearance of a steadicam. Given a desired output camera path, our method then automatically warps the input sequence so that it appears to have been captured along the specified path. 1 http://www.cs.wisc.edu/graphics/Gallery/WarpFor3DStabilization/ While existing video stabilization algorithms are successful at re- moving small camera jitters, they typically cannot produce the more aggressive changes required to synthesize idealized camera mo- tions. Most existing methods operate purely in 2D; they apply full- frame 2D warps (e.g., affine or projective) to each image that best remove jitter from the trajectory of features in the video. These 2D methods are fundamentally limited in two ways: first, a full-frame warp cannot model the parallax that is induced by a translational shift in viewpoint; second, there is no connection between the 2D warp and a 3D camera motion, making it impossible to describe desired camera paths in 3D. We therefore consider a 3D approach. Image-based rendering methods can be used to perform video sta- bilization in 3D by rendering what a camera would have seen along the desired camera path [Buehler et al. 2001a]. However, these tech- niques are currently limited to static scenes, since they render a novel viewpoint by combining content from multiple video frames, and therefore multiple moments in time. Our work is the first technique that can perform 3D video stabi- lization for dynamic scenes. In our method, dynamic content and other temporal properties of video are preserved because each out- put frame is rendered as a warp of a single input frame. This con- straint implies that we must perform accurate novel view interpo- lation from a single image, which is extremely challenging [Hoiem et al. 2005]. Performing this task for a non-rigid dynamic scene captured by a single camera while maintaining temporal coherence is even harder; in fact, to the best of our knowledge it has never been attempted. An accurate solution would require solving several challenging computer vision problems, such as video layer separa- tion [Chuang et al. 2002], non-rigid 3D tracking [Torresani et al. 2008], and video hole-filling [Wexler et al. 2004]. In this paper we provide a technique for novel view interpolation that avoids these challenging vision problems by relaxing the constraint of a physically-correct reconstruction. For our application, a perceptu- ally plausible result is sufficient: we simply want to provide the illusion that the camera moves along a new but nearby path. In prac- tice, we find our technique is effective for video stabilization even though our novel views are not physically accurate and would not match the ground truth. Our method takes advantage of recent advances in two areas of re- search: shape-preserving image deformation [Igarashi et al. 2005], which deforms images according to user-specified handles while minimizing the distortion of local shape; and content-aware im- age resizing [Avidan and Shamir 2007; Wolf et al. 2007], which changes the size of images while preserving salient image content. Both of these methods minimize perceivable image distortion by optimally distributing the deformation induced by user-controlled edits across the 2D domain. We apply this same principle to image warps for 3D video stabilization, though in our case we optimally distribute the distortion induced by a 3D viewpoint change rather than user-controlled deformation. Since the change in viewpoint re- quired by video stabilization is typically small, we have found that this not-physically-correct approach to novel view interpolation is sufficient even for challenging videos of dynamic scenes. Our method consists of three stages. First, it recovers the 3D camera motion and a sparse set of 3D, static scene points using an off-the- shelf structure-from-motion (SFM) system. Second, the user inter- actively specifies a desired camera path, or chooses one of three 1 To appear in the ACM SIGGRAPH conference proceedings camera path options: linear, parabolic, or a smoothed version of the original; our algorithm then automatically fits a camera path to the input. Finally, our technique performs a least-squares optimiza- tion that computes a spatially-varying warp from each input video frame into an output frame. The warp is computed to both follow the sparse displacements suggested by the recovered 3D structure, and minimize distortion of local shape and content in the video frames. The result is not accurate, in the sense that it will not reveal the dis- occlusions or non-Lambertian effects that an actual viewpoint shift should yield; however, for the purposes of video stabilization, we have found that these inaccuracies are difficult to notice in casual viewing. As we show in our results, our method is able to con- vincingly render a range of output camera paths that are reasonably close to the input path, even for highly dynamic scenes. 2 Related Work Two-dimensional video stabilization techniques have reached a level of maturity that they are commonly implemented in on-camera hardware and run in real time [Morimoto and Chellappa 1997]. This approach can be sufficient if the user only wishes to damp unde- sired camera shake, if the input camera motion consists mostly of rotation with very little translation, or if the scene is planar or very distant. However, in the common case of a camera moving through a three-dimensional scene, there is typically a large gap between 2D video stabilization and professional-quality camera paths. The idea of transforming hand-held videos to appear as if they were taken as a proper tracking shot was first realized by Gleicher and Liu [2008]. Their approach segments videos and applies idealized camera movements to each. However, this approach is based on full-frame 2D warping, and therefore suffers (as all 2D approaches) from two fundamental limitations: it cannot reason about the move- ment of the physical camera in 3D, and it is limited in the amount of viewpoint change for scenes with non-trivial depth complexity. The 3D approach to video stabilization was first described by Buehler et al. [2001a]. In 3D video stabilization, the 3D camera motion is tracked using structure-from-motion [Hartley and Zisser- man 2000], and a desired 3D camera path is fit to the hand-held input path. With this setup, video stabilization can be reduced to the classic image-based rendering (IBR) problem of novel view in- terpolation: given a collection of input video frames, synthesize the images which would have been seen from viewpoints along the de- sired camera path. Though the novel viewpoint interpolation prob- lem is challenging and ill-posed, recent sophisticated techniques have demonstrated high-quality video stabilization results [Fitzgib- bon et al. 2005; Bhat et al. 2007]. However, the limitation to static scenes renders these approaches impractical, since most of us shoot video of dynamic content, e.g., people. Image warping and deformation techniques have a long his- tory [Gomes et al. 1998]. Recent efforts have focused on defor- mation controlled by a user who pulls on various handles [Igarashi et al. 2005; Schaefer et al. 2006] while minimizing distortion of local shape, as measured by the local deviation from conformal or rigid transformations. These methods, which build on earlier work in as-rigid-as-possible shape interpolation [Alexa et al. 2000], are able to minimize perceivable distortion much more effectively than traditional space-warp methods [Beier and Neely 1992] or standard scattered data interpolation [Bookstein 1989]. Our method applies this principle in computing spatially-varying warps induced by the recovered 3D scene structure. A related image deformation problem is to change the size or aspect ratio of an image without distorting salient image structure. Seam Carving [Avidan and Shamir 2007] exploited the fact that less perceptually salient regions in an image can be deformed more freely than salient regions, and was later ex- tended to video [Rubinstein et al. 2008]. However, the discrete algo- rithm behind Seam Carving requires removing one pixel from each image row or column, which limits its application to general image warping. Others have explored more continuous formulations [Gal et al. 2006; Wolf et al. 2007; Wang et al. 2008], which deform a quad mesh placed on the image according to the salience (or user- marked importance) found within each quad; we take this approach in designing our deformation technique. A limitation of our approach is that it requires successful computa- tion of video structure-from-motion. However, this step has become commonplace in the visual effects industry, and commercial 3D camera trackers like Boujou2 and Syntheyes3 are widely used. We use the free and publicly available Voodoo camera tracker4, which has been used in a number of recent research systems [van den Hen- gel et al. 2007; Thorma¨hlen and Seidel 2008]. Finally, there are a number of orthogonal issues in video stabilization that we do not address [Matsushita et al. 2006], such as removing motion blur, and full-frame video stabilization that avoids the loss of information at the video boundaries via hole-filling (we simply crop our output). These techniques could be combined with our method to yield a complete video stabilization solution. 3 Traditional video stabilization We begin by describing the current technical approaches to video stabilization in more detail, and showing their results on the exam- ple sequence in Video Figure 1 (since many of the issues we discuss can only be understood in an animated form, we will refer to a set of video figures that are included as supplemental materials and on the project web site1). 3.1 2D stabilization Traditional 2D video stabilization proceeds in three steps. First, a 2D motion model, such as an affine or projective transformation, is estimated between consecutive frames. Second, the parameters of this motion model are low-pass filtered across time. Third, full- frame warps computed between the original and filtered motion models are applied to remove high-frequency camera shake. Video Figures 2 and 3 show two results of this approach, created using our implementation of Matsushita et al. [2006] (we do not perform the inpainting or deblurring steps, and the two videos contain different degrees of motion smoothing). While 2D stabilization can significantly reduce camera shake, it cannot simulate an idealized camera path similar to what can be found in professional tracking shots. Since the 2D method has no knowledge of the 3D trajectory of the input camera, it cannot rea- son in 3D about what the output camera path should be, and what the scene would have looked like from this path. Instead, it must make do with fitting projective transformations (which are poor ap- proximations for motion through a 3D scene) and low-pass filtering them. Strong low-pass filtering (Video Figure 3) can lead to visible distortions of the video content, while weak filtering (Video Figure 2) only damps shake; neither can simulate directed camera motions. 3.2 3D stabilization The 3D approach to video stabilization is more powerful, though also more computationally complex. Here, the actual 3D trajectory of the original camera is first estimated using standard structure- from-motion [Hartley and Zisserman 2000]; this step also results in 2http://www.2d3.com 3http://ssontech.com 4http://www.digilab.uni-hannover.de 2 To appear in the ACM SIGGRAPH conference proceedings Figure 1: A crop of a video frame created using novel view inter- polation. While the static portions of the scene appear normal, the moving people suffer from ghosting. Figure 2: A crop of a video frame created using generic sparse data interpolation. The result does not contain ghosting, but distorts structures such as the window and pole highlighted with red arrows. a sparse 3D point cloud describing the 3D geometry of the scene. Second, a desired camera path is fit to the original trajectory (we de- scribe several approaches to computing such a path in Section 4.3). Finally, an output video is created by rendering the scene as it would have been seen from the new, desired camera trajectory. There are a number of techniques for rendering novel views of a scene; in Video Figure 4 we show a video stabilization result cre- ated using the well-known unstructured lumigraph rendering algo- rithm [Buehler et al. 2001b]. The result is remarkably stable. How- ever, like all novel view interpolation algorithms, each output frame is rendered as a blend of multiple input video frames. Therefore, dy- namic scene content suffers from ghosting (we show a still frame example of this ghosting in Figure 1). One approach to handling dynamic scene content would be to iden- tify the dynamic objects, matte them out, use novel view interpo- lation to synthesize the background, re-composite, and fill any re- maining holes. However, each of these steps is a challenging prob- lem, and the probability that all would complete successfully is low. Therefore, in the next section we introduce the constraint that each output video frame be rendered only from the content in its corre- sponding input video frame. 4 Our approach Our approach begins similarly to the 3D stabilization technique just described; we recover the original 3D camera motion and sparse 3D point cloud using structure-from-motion, and specify a desired output camera motion in 3D (in this section we assume the output path is given; our approach for computing one is described in Sec- tion 4.3). Then, rather than synthesize novel views using multiple input video frames, we use both the sparse 3D point cloud and the content of the video frames as a guide in warping each input video frame into its corresponding output video frame. More specifically, we compute an output video sequence from the input video such that each output video frame It is a warp of its corresponding input frame Iˆt (since we treat each frame indepen- dently, we will omit the t subscript from now on). As guidance we have a sparse 3D point cloud which we can project into both the input and output cameras, yielding two sets of corresponding 2D points: P in the output, and Pˆ in the input. Each k’th pair of pro- jected points yields a 2D displacement Pk − Pˆk that can guide the warp from input to output. The problem remaining is to create a dense warp guided by this sparse set of displacements. This warp, which can use the displacements as either soft or hard constraints, should maintain the illusion of a natural video by maintaining tem- poral coherence and not distorting scene content. We first consider two simple warping solutions, the first of which is not successful, and the second of which is moderately successful. The first solution is to use generic sparse data interpolation to yield a dense warp from the sparse input. In Video Figure 5 we show a result computed by simply triangulating the sparse points and us- ing barycentric coordinates to interpolate the displacements inside the triangles; the displacements are therefore treated as hard con- straints. The result has a number of problems. Most significantly, important scene structures are distorted (we show a still example in Figure 2). These distortions typically occur near occlusions, which are the most challenging areas for novel view interpolation. Also, problems occur near the frame boundaries because extrapolation outside the hull of the points is challenging (for this example, we do not perform extrapolation). Finally, treating the displacements as hard constraints leads to temporal incoherence since the recon- structed 3D points are not typically visible for the entire video. Pop- ping and jittering occur when the corresponding displacements ap- pear and disappear over time. In this example, we use a very short segment of video and only include points that last over the entire du- ration of the video; however, the problem is unavoidable in longer sequences. Our approach for preserving temporal coherence, which is only applicable if displacements are used as soft constraints, is described in Section 4.1.4. The second alternative is to fit a full-frame warp to the sparse dis- placements, such as a homography (thereby treating the displace- ments as a soft constraint). We show a somewhat successful result of this technique in Video Figure 6. This method can achieve good results if the depth variation in the scene is not large, or if the de- sired camera path is very close to the original. We show a less suc- cessful result of this technique in Video Figure 7. In the general case, a homography is too constrained a model to sufficiently fit the desired displacements. This deficiency can result in undesired distortion (we show an individual frame example in Figure 3), and temporal wobbling. However, this novel approach is the best of the alternatives we have considered up to now. The first solution described above is too flexible; it exactly sat- isfies the sparse displacements, but does n

本文档为【视频去抖动资料】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

视频去抖动资料

热门搜索

历史搜索