阿凡达i下载_在线阅读_10

is_212487

暂无简介

阿凡达i ACM Reference Format Pantaleoni, J., Fascione, L., Hill, M., Aila, T. 2010. PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes. ACM Trans. Graph. 29, 4, Article 37 (July 2010), 10 pages. DOI = 10.1145/1778765.1778774 http://doi.acm.org/10.1145/1778765....

ACM Reference Format Pantaleoni, J., Fascione, L., Hill, M., Aila, T. 2010. PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes. ACM Trans. Graph. 29, 4, Article 37 (July 2010), 10 pages. DOI = 10.1145/1778765.1778774 http://doi.acm.org/10.1145/1778765.1778774. Copyright Notice Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profi t or direct commercial advantage and that copies show this notice on the fi rst page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specifi c permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701, fax +1 (212) 869-0481, or permissions@acm.org. © 2010 ACM 0730-0301/2010/07-ART37 $10.00 DOI 10.1145/1778765.1778774 http://doi.acm.org/10.1145/1778765.1778774 PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes Jacopo Pantaleoni∗ NVIDIA Research Luca Fascione† Weta Digital Martin Hill† Weta Digital Timo Aila∗ NVIDIA Research Figure 1: The geometric complexity of scenes rendered in the movie Avatar often exceeds a billion polygons and varies widely: distant rocks and vegetation are tessellated to a level of meters and centimeters, while the faces of even distant characters are modeled to over 40,000 polygons from forehead to chin. The spatial resolution of occlusion caches precomputed by our system also spans several orders of magnitude. Abstract We describe the architecture of a novel system for precomputing sparse directional occlusion caches. These caches are used for ac- celerating a fast cinematic lighting pipeline that works in the spher- ical harmonics domain. The system was used as a primary light- ing technology in the movie Avatar, and is able to efficiently han- dle massive scenes of unprecedented complexity through the use of a flexible, stream-based geometry processing architecture, a novel out-of-core algorithm for creating efficient ray tracing acceleration structures, and a novel out-of-core GPU ray tracing algorithm for the computation of directional occlusion and spherical integrals at arbitrary points. CR Categories: I.3.2 [Graphics Systems C.2.1, C.2.4, C.3)]: Stand-alone systems—; I.3.7 [Three-Dimensional Graphics and Realism]: Color,shading,shadowing, and texture—Raytracing; Keywords: global illumination, precomputed radiance transfer, caching, out of core ∗e-mail:{jpantaleoni,taila}@nvidia.com †e-mail:{lukes,martinh}@wetafx.co.nz 1 Introduction The movie Avatar featured unprecedented geometric complexity (Figure 1), with production shots containing anywhere from ten million to over one billion polygons. To make the rendering of such complex scenes manageable while satisfying the need to provide fast lighting iterations for lighting artists and the director, modern relighting methods based on spheri- cal harmonics (SH) [Ramamoorthi and Hanrahan 2001] and image- based lighting [Debevec 1998] were used. These methods can speed up the lighting iterations significantly, but unfortunately re- quire an extremely compute and resource intensive precomputation of directional occlusion information. Directional occlusion encodes the visibility term used for lighting modulation as a function of di- rection, and is typically computed using ray tracing. We describe PantaRay1, a system designed to make this precompu- tation practical by leveraging the development of modern ray trac- ing algorithms for massively parallel GPU architectures [Aila and Laine 2009] and combining them with new out-of-core and level of detail rendering techniques. The PantaRay engine is an out-of-core, massively parallel ray tracer designed to handle scenes that are roughly an order of magni- tude bigger than available system memory, and that require baking spherical harmonics-encoded directional occlusion (SH occlusion) and indirect lighting information for billions of points with highly varying spatial density. Our key contributions are the introduction of a flexible, stream- based geometry processing architecture, a novel out-of-core algo- rithm for constructing efficient ray tracing acceleration structures, and a novel out-of-core GPU ray tracing algorithm for the compu- tation of directional occlusion and spherical integrals. These are 1A twist on the Greek aphorism panta rei, i.e. everything flows ACM Transactions on Graphics, Vol. 29, No. 4, Article 37, Publication date: July 2010. beauty image PRMan tessellation PRMan final render PantaRay vislocal micropolygons other cachesvislocal cache scene geometry PRMan other caching Figure 2: A visual representation of the rendering pipeline used for the movie Avatar showing the various passes, the data flow among them, and the role played by our system. combined into a new precomputation system designed to efficiently handle very high levels of geometric complexity. Our system has been integrated into the production pipeline of Weta Digital and is showcased in the movie Avatar, but the algorithmic contributions and design decisions discussed in this paper could be usefully applied in other domains, such as large-scale scientific vi- sualization, which would benefit from rich lighting of extremely complex geometric datasets. 2 Related Work Much research has addressed the topic of massive model rendering and visualization. Here we compare our system to some of the most relevant work. There is a vast amount of literature on the topic of direct visual- ization of massive triangle meshes. Most such methods, includ- ing [Borgeat et al. 2005] and [Cignoni et al. 2004], subdivide the models into cells or patches and create multiple or progressive LOD representations of those elements through mesh simplification. As the goal of our system is not direct visualization but rather the com- putation of low-frequency directional occlusion information, these accurate simplification methods are not needed and we resort to much cruder representations. Moreover, as we target ray tracing, our out-of-core spatial index construction had the additional re- quirement of targeting high ray tracing efficiency, employing parti- tioning and subdivision methods based on the surface area heuristic (SAH) [Havran 2000]. Wald et al. [2005] and Yoon et al. [2006] introduced two sys- tems based on level of detail (LOD) for ray tracing large triangle meshes. Unlike our approach, their systems relied on OS-level memory mapping functionality and targeted moderately parallel systems such as commodity multi-CPU systems, performing LOD selection in each thread independently. This strategy would not be portable to modern massively parallel GPU architectures. More- over, no special effort was taken to speed up the out-of-core con- struction of the acceleration structure, which in the case of [Wald et al. 2005] took up to a day for a model containing 350M triangles. Crassin et al. [2009] and Gobbetti et al. [2008] introduced two sys- tems to render large volumetric datasets. These systems perform direct visualization of geometry represented as voxel grids, rather than computing complex visibility queries. Like our system, both approaches decompose computation into a CPU-based LOD selec- tion phase and a GPU-based rendering phase. Their systems per- form these steps to visualize the entire model from a single point of view at each frame, while we do it to compute directional occlusion from large batches of nearby points at the same time. Christensen et al. [2003] presented a ray tracing system using ray differentials to perform LOD selection for high order surfaces. The described system is able to efficiently handle very large tessellations of the base meshes, but does not provide a level of detail scheme to handle base meshes which do not fit in main memory. This was essential for our approach, which needed to handle base meshes with hundreds of millions or billions of control polygons. Budge et al. [2009] presented an out-of-core data management layer for path tracing on heteregeneous architectures. The system builds on a dataflow network of kernel queues and a rendering-agnostic task scheduler that prioritizes the execution of kernels based on data availability, queue size and other criteria. The path tracer exploits this generic framework by using a two-level acceleration structure, where each second level out-of-core hierarchy is bound to a distinct processing queue, extending the work of [Pharr et al. 1997]. The resulting algorithm shows good scalability and thus satisfies one of our main requirements. Unlike their work, we focus on developing highly efficient special-purpose algorithms for the computation of directional occlusion, minimizing I/O through careful LOD selec- tion, and on the problem of efficient construction of high quality out-of-core acceleration structures. Ragan-Kelley et al. [2007] introduced Lightspeed as an interac- tive lighting preview system that can greatly accelerate relighting with local light sources and shadow maps in the presence of pro- grammable shaders. Unlike their work, we focus on the efficient computation of complex visibility for fast image based lighting in massive scenes. 3 System Overview Lighting of the movie Avatar was performed with a spherical har- monics lighting pipeline based on the work of Ramamoorthi and Hanrahan [2001], in which light transport is decomposed into a multiple product integral: Lo(x,ωo) = � Ω+ Li(x,ω)ρ(x,ω,ωo)V (x,ω)�ω, nˆ�dω (1) where Lo is the exitant radiance, x is the point of interest, ωo is the outgoing direction, Ω+ is the hemisphere above x, Li is incident ra- diance, ω is the incident direction, ρ is the BRDF,V is the visibility function, nˆ is the normalized surface normal and �·, ·� indicates the scalar product operator. In this framework, directional visibility is precomputed at sparse locations in the scene and stored in a spherical harmonics basis. Building on the work of Kautz et al. [2002], Ng et al. [2004], and Snyder [2006], this directional visibility can then be reused over many lighting cycles by performing a simple dot-product with the less expensive terms of the equation, which are computed at render time. Our system was built to efficiently perform this precomputa- tion on massive scenes of unprecedented complexity. The overall pipeline is divided into several computation passes as depicted in Figure 2. During preparation, the scene geometry is tessellated and divided into microgrids according to a camera- based metric, using a custom point cloud output driver in Photo- Realistic RenderMan (PRMan). We store these microgrids on disk in a stream representation which allows vertices to be associated 37:2 • J. Pantaleoni et al. ACM Transactions on Graphics, Vol. 29, No. 4, Article 37, Publication date: July 2010. Figure 3: Zooming into scene 6 shows the various levels of tessellation. with arbitrary user data, much like the primitive variable mecha- nism in PRMan [Upstill 1990] or the vertex attribute machinery in OpenGL [Segal and Akeley 1999]. In order to include occluding geometries not directly visible to the camera, assets outside of the viewing frustum are also tessellated, either using a relatively large overscan or according to a world-based metric. Figure 3 shows an example of the various tesselation densities encountered in a typical production scene. The vislocal pass invokes our PantaRay engine to augment the mi- crogrid stream with directional occlusion data encoded in the spher- ical harmonics basis and other precomputed quantities such as area light visibility, blurred reflections and occasionally one-bounce in- direct lighting. All these properties are generated by programmable shaders using the ray tracing capabilities of our engine. In the end the result of the PantaRay precomputation is used in PRMan to render the final images in what is called the beauty pass. In this pass, the lighting, BRDF and visibility fields are composed at render time at a very low cost, to the point where the lighting iterations can happen inside the beauty pipeline at final quality. While the vislocal datasets can be reused for many lighting itera- tions, which greatly offsets their computation cost, computing vis- local remains an extremely resource-intensive process, and is a nat- ural point to start looking for optimizations. To illustrate the targeted complexity, the movie Avatar required baking scenes with tens of thousands of different plants modeled as subdivision surfaces at a resolution of 100K to 1M control polygons each, and hundreds of characters modeled at a resolution of 1-2M control polygons. Since occlusion is a global effect, out-of-camera objects must be kept during the computation. Similarly, translu- cence and subsurface scattering require processing geometry that is not directly visible from the camera. Rather than tracing full reso- lution models, lower resolution proxies could have been developed and used for far away assets. While our pipeline used stochastic simplification to reduce the complexity of vegetation before ras- terization [Cook et al. 2007], we did not explore the possibility of performing any additional simplification to the ray tracing assets before they entered our system: we chose instead to construct a fully automated system capable of directly handling the raw model complexity rather than create a semi-automatic pipeline for proxy generation. The highly variable spatial resolution of the PantaRay output pre- sented another challenge: many shots in these scenes required a spatially varying baking resolution ranging from a few points per meter on distant geometry such as terrains, to several points per millimeter, for example to accurately represent the lighting on and under the characters’ fingernails. The speed and memory limitations of existing general purpose ray tracing technology, and the reduced flexibility and programmability in other special purpose baking tools, such as ptfilter [Christensen 2008], did not scale to these production needs. In practice, our goal was to raise the tractability limit of shots in the movie Avatar by roughly 2 orders of magnitude in terms of both speed and scene size while keeping a reasonable degree of programmability. 4 Architecture Handling the necessary complexity inside a flexible ray tracing sys- tem requires efficient out-of-core and streaming techniques. To support the use of such methods throughout the entire software pipeline, we designed the system around the concept of microgrid streams, which are opaque sources of microgrids (that is microp- olygon grids as in [Cook et al. 1987]). Microgrid streams can be read into main memory and eventually rewound, or restarted from the beginning. Such streams can represent either geometry stored on disk or procedural geometry. Each microgrid is essentially a small indexed mesh with up to 256 vertices forming micropolygons, where each micropolygon can have one, two, three or four vertices (to represent points, lines, triangles and quads). Vertices are repre- sented by their position, a normal, a radius and any attached user data. We decided to disallow any form of random access for two reasons: first, geometry files are typically compressed to save disk space and potentially achieve higher I/O bandwidth; second, input streams could be procedurally generated, and the procedural gener- ation function might not allow for individual primitive generation (as for example in some L-systems). The input to PantaRay is an XML scene description, containing a list of shaders, a list of geometries and their associated binding relationships. A geometry is a microgrid stream, which can specify both an oc- cluder and a collection of bake sets. A bake set represents the central PantaRay unit of work, and specifies that the input stream should be cloned to a corresponding output stream and further dec- orated with a given list of shader output attributes. Geometries can further be instanced through a user-defined transformation, poten- tially specifying a procedural displacement shader. Shaders are programmable units responsible for computing some required information at the vertices of each microgrid in a bake set. The first task that PantaRay performs after parsing the scene file is building an out of core acceleration structure (AS) for the input oc- cluder geometry. After the AS is built, PantaRay processes the bake sets and begins shader execution. The following sections describe these processes in detail. 4.1 Acceleration Structure Generation The main bottleneck in building an out-of-core acceleration struc- ture can easily be I/O speed, as typical bounding volume hierar- chies (BVH) or k-d tree building strategies require touching all the objects multiple times. Even taking into account the performance of state-of-the-art storage technologies, the system had to assume that tens of thousand of concurrent processes would be using the same storage, requiring all non-local I/O to be modeled as a high latency, high bandwidth device. Hence we developed a general purpose stream-based builder which tried to minimize the number of times the stream is rewound. The first component of this builder is a streaming bucketing pass designed to handle hundreds of millions of microgrids. The buck- eter uses a simple binning approach: it constructs a regular 3d grid by first streaming the geometry once to count how many microgrids PantaRay: Fast Ray-traced Occlusion Caching of Massive Scenes • 37:3 ACM Transactions on Graphics, Vol. 29, No. 4, Article 37, Publication date: July 2010. (b) (etc.) (c) (d)(a) Figure 4: Out-of-core spatial index construction. Microgrids stream from disk into a regular grid of buckets (a). Buckets are coalesced and split into chunks (b) of up to 64KB. A BVH inside and among chunks (c) is broken into bricks (d) of up to 256 nodes. Each brick is contiguous on disk. fall in each bucket, and then streaming it a second time to populate those buckets on local disk. The first streaming pass reserves the correct amount of disk space for each bucket and creates an index, but also keeps statistics about the number of microgrids, micropolygons, vertices and byte size for each of them. The second pass of the algorithm loops through each microgrid to find out all the buckets in which the microgrid falls, and records the microgrid-bucket pairs into an in-memory cache with a few million entries. Once the cache is full, the pairs are sorted by bucket index and written to disk in their corresponding slot, essentially making a single seek per bucket or less per cache flush. The purpose of this bucketing pass is to create manageable units of work which could fit in memory. However, the resulting uniform grid is very coarse and often imbalanced, which makes it unsuitable for direct ray tracing. With extremely large scenes it frequently hap- pens that a large portion of the buckets are empty or very sparsely populated, while a few remain too densely populated. For these reasons, after the bucketing is done, we perform a chunk- ing pass, whose purpose is to build a second disk-based spatial in- dex with more uniform distribution of geometry, aggregating low- complexity buckets and splitting high-complexity ones untill all oc- cupy roughly 64KB of memory. We consider an implicit k-d tree over the uniform grid of buckets. First, we perform a bottom-up propagation of statistics from the leaves to the parents, so that for each node it is possible to compute a rough estimate of the aggre- ga

本文档为【阿凡达i】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

阿凡达i

热门搜索

历史搜索