旋翼机空气动力学下载_在线阅读_10

is_532078

暂无简介

旋翼机空气动力学 AIAA JOURNAL Vol. 40, No. 5, May 2002 Parallelization of Rotorcraft Aerodynamics Navier–Stokes Codes Kivanc Ekici¤ and Anastasios S. Lyrintzis† Purdue University, West Lafayette, Indiana 47907-1282 The modi cation of unsteady three-dimensional Navier–Stokes cod...

AIAA JOURNAL Vol. 40, No. 5, May 2002 Parallelization of Rotorcraft Aerodynamics Navier–Stokes Codes Kivanc Ekici¤ and Anastasios S. Lyrintzis† Purdue University, West Lafayette, Indiana 47907-1282 The modi cation of unsteady three-dimensional Navier–Stokes codes for application on massively parallel and distributed computing environments is investigated. Previously, the Eulermode of the Navier–Stokes code TURNS has been parallelized. For the ef cient implementationof theNavier–Stokesmode of TURNS onparallel computing systems, several algorithmic changes should be made. The main modi cation is done on the implicit operator, lower–upper symmetric Gauss–Seidel. Two new implicit operators are used because of convergence problems of traditional operators with high cell aspect ratio grids needed for viscous calculations. Results for Navier–Stokes cases are presented for various operators. The message passing interface protocol is used because of its portability to various parallel architectures. Nomenclature a = speed of sound e = total energy p = pressure Pr = Prandtl number Re = Reynolds number u; v; w = Cartesian velocities x; y; z = Cartesian coordinates ° = speci c heat ratio ¹ = dynamic viscosity »; ´; ³ = generalized coordinates ½ = density 9 = rotor azimuth angle Introduction R OTORCRAFT aerodynamicsare very complex becausethree-dimensional unsteady transonic viscous aerodynamic phe- nomena are involved. The ability to predict the ow around heli- copter rotors is vital for the control of high-speed losses, vibration, and noise. The advent of tilt-rotor aircraft in the past few years further complicates rotor aerodynamics. Computational uiddynamics(CFD)hasprovento be aneffective tool for predictionof rotorcraftaerodynamicsand noise. Traditional CFD methods used for this purpose have included transonic small disturbancepotentialmodels and full-potentialmodels.1;2 Whereas potentialmethodscan generatea good solutionat relativelylowcost, theyareunableto resolvesome owfeaturesthat canhavea large im- pact on the aerodynamic and aeroacoustic properties. For instance, the vorticalwake producedby the rotor blades plays a dominant role in unsteady load uctuation, affecting aerodynamic performance. Also, shocks produced by transonic ow near the blade tip, often encountered in forward ight, have a large effect on the high-speed impulsive noise generated by the blades and consequentlymust be determined accurately.For these reasons,many more recent design analyses3 use Euler/Navier–Stokes methods. However,Navier–Stokesmethods tend to be computationallyde- manding,whichhasdelayedtheirwidespreadindustrialuse.Parallel Received 28 February 2001; revision received 24 September 2001; ac- cepted for publication 8 October 2001. Copyright c° 2001 by Kivanc Ekici and Anastasios S. Lyrintzis. Published by the American Institute of Aero- nautics and Astronautics, Inc., with permission. Copies of this paper may be made for personal or internal use, on condition that the copier pay the $10.00 per-copy fee to the CopyrightClearance Center, Inc., 222Rosewood Drive, Danvers, MA 01923; include the code 0001-1452/02 $10.00 in cor- respondence with the CCC. ¤Research Assistant, School of Aeronautics and Astronautics; currently Research Associate, Department of Mechanical Engineering and Materi- als Science, Duke University, Durham, NC 27708-0300. Student Member AIAA. †Associate Professor, School of Aeronautics and Astronautics. Associate Fellow AIAA. computation offers the potential for cheaper and faster computa- tions. Several vendors have released machines that utilize com- modity reduced instruction set computer (RISC) processors that are the same as those used in high-end workstations, for example, IBM SP2 and SGI Origin. These machines generally attain bet- ter price/performance than vector processors and have peak execu- tion rates that are several times faster than vector parallelmachines such as the Cray C-90. It is also possible to assemble a collec- tion of off-the-shelf hardware components and software packages and achieve high performance at very low prices.4;5 Several com- panies are beginning to utilize parallel processing in the form of networked workstations, which sit idle during off hours, to attain supercomputerperformance.6 Unfortunately,many traditionalCFD algorithmswere developedwith serial or vectorizedcomputationin mind, and consequentlyrequiresigni cantmodi cation for ef cient parallel execution.Also, a certain amount of code rewriting, for ex- ample, adding message passing calls, is necessary, which can be a signi cant time investment for long and well-established codes. These factors have been deterrents to more widespread industrial adoption of parallel processing. The current study focuses on efforts to improve the ef ciency of the TURNS code developedby Srinivasanet al.7 and Srinivasanand Baeder.8 TURNS is chosenbecauseit providesa well-known indus- try standardfor rotorcraftaerodynamics.The codeuses a third-order accurate,upwindbiasedscheme thatprovidesshockcapturing.Flux limiters are appliedso that the scheme is total variationdiminishing. In TURNS themain bottleneckfor the paralleloptimizationis the lower–upper symmetric Gauss–Seidel (LU-SGS) implicit operator, currently used for both steady and unsteady calculations with the code. Candler et al.9 developed an ef cient algorithm called data parallel lower–upper relaxation (DP-LUR) to parallelize LU-SGS for a data parallel environment.Wissink et al.10¡12 developeda new hybrid domain decomposition implementation of the LU-SGS and DP-LUR operators and applied it to the Euler mode of TURNS successfully. This method can be extended to the Navier–Stokes equations. However, the LU-SGS algorithm exhibits poor conver- gence for the high-cell-aspect-ratio (CAR) grids needed to resolve theboundarylayer of highReynolds ows, as shown, for example,in Refs. 13 and 14. Wright et al.15 proposed a full matrix modi cation of the LU relaxation method that works well for the high-aspect- ratio grids. In addition,Wright et al.16 also developed another type of fullmatrixmethod, data parallel line relaxation(DPLR), which is even better for the high-aspect-ratiogrids.Neither of thesemethods have been tested for three-dimensionalunsteady ows. The focus of this study is to develop a new hybridmethod similar to that of Wissink using the full matrix approach and to apply this new algorithm togetherwith DPLR to TURNS for the ef cient par- allelizationof rotorcraftaerodynamicscodes.Note that the LU-SGS algorithmhas been used in a number of well-knownCFD codes, for example, INS3D17 and OVERFLOW,18 primarily for its stability properties with larger time steps. Therefore, our proposed paral- lelizationmethodologywill be applicable to several Navier–Stokes 887 旋翼机空气动力学：www.tech-domain.com 888 EKICI AND LYRINTZIS codes. Message passing interface (MPI) will be used because of its portability to different parallel architectures. Governing Equations The nondimensional conservation law form of the three- dimensional, thin-layer Navier–Stokes equations in curvilinear space »; ´; ³ , and ¿ can be written as19;20 @¿q C @» E C @´F C @³G D Re¡1@³ S (1) where q D 1 J 266664 ½ ½u ½v ½w e 377775 ; E D 1J 266664 ½U ½uU C »x p ½vU C »y p ½wU C »z p U .eC p/¡ »t p 377775 F D 1 J 266664 ½V ½uV C ´x p ½vV C ´y p ½wV C ´z p V .eC p/¡ ´t p 377775 ; G D 1J 266664 ½W ½uW C ³x p ½vW C ³y p ½wW C ³z p W .eC p/¡ ³t p 377775 (2) The contravariant velocities in the preceding equations, namely, U;V ; and W , are de ned as U D »t C »xu C »yv C »zw V D ´t C ´xu C ´yv C ´zw W D ³t C ³xu C ³yv C ³zw (3) The quantitiessuch as »x ; »y; and »z are themetrics of the coordinate transformation, and J is the Jacobian of the transformation. The viscous ux vector for the thin layer approximation S is given by S D ¹ J 266666664 0¡ ³ 2 x C ³ 2y C ³ 2z ¢ u³ C 1=3.³xu³ C ³yv³ C ³zw³ /³x¡ ³ 2 x C ³ 2y C ³ 2z ¢ v³ C 1=3.³xu³ C ³yv³ C ³zw³ /³y¡ ³ 2 x C ³ 2y C ³ 2z ¢ w³ C 1=3.³xu³ C ³yv³ C ³zw³ /³z¡ ³ 2 x C ³ 2y C ³ 2z ¢ m C 1=3.³xu³ C ³yv³ C ³zw³ /n 377777775 (4) where m D 0:5.u2 C v2 Cw2/³ C [1=Pr.° ¡ 1/].a2/³ n D .³xu C ³yv C ³zw/ For the turbulent viscous ows, the viscosity coef cient ¹ is com- puted as the sum of the laminar viscosity coef cient ¹l estimated using Sutherland’s law and the turbulent viscosity¹t estimated us- ing the Baldwin–Lomax algebraic turbulencemodel. The pressure can be obtained from the equation of state given as p D .° ¡ 1/fe ¡ .½=2/.u2 C v2 C w2/g (5) Numerical Method The rst-order implicitEuler integrationin time gives the follow- ing form of the equations:£ I C1¿ ¡±» An C ±´Bn C ±³Cn ¡ Re¡1±2³ Mn¢¤1qn D ¡1¿ f .qn/ (6) where f .q n / D ¡±» En C ±´Fn C ±³Gn ¡ Re¡1±³ Sn¢ (7) is the ux evaluated using the upwind-biased scheme of Roe21 at time leveln; A, B, andC are the inviscidJacobiansgivenby Pulliam and Steger20; and M is the viscous Jacobian given by Tysinger and Caughey.22 The monotone upstream-centered scheme of Anderson et al.23 for the conservativelaws (MUSCL) is used to get high-order accuracy with ux limiters. First, a ux vector splitting is applied to the ux Jacobians A, B, and C , of the inviscid ux vectors E , F , and G . In the » direction, A is split into its positive and negative eigenvalueparts, AD AC C A¡, and the positivematrix is backward differenced,whereasthe negativematrix is forwarddifferenced.The same technique is applied to the other inviscid Jacobians in the cor- responding directions. The viscous JacobianM , on the other hand, is simply central differenced in ³ direction. When the backward differencedC terms in L and the forward differenced¡ terms in U are collected, the systemcan be divided into lower and upper block- tridiagonal factors with a diagonal factor given by D D I C1¿.AC ¡ A¡ C BC ¡ B¡ C CC ¡ C¡ C 2M/ j;k;l L D D ¡1¿ ¡ACj ¡ 1 C BCk¡ 1 C CCl¡ 1 C Ml¡ 1¢ U D D C 1¿¡A¡j C 1 C B¡k C 1 C C¡l C 1 ¡ Ml C 1¢ (8) Second, to avoid costly computation times for the inversion of the 5£ 5 D matrix, an ef cient spectral radius approximationproposed by Yoon and Kwak14 is applied to the ux Jacobians: A§ D 12 .A § ½A I / (9) where ½A is the spectral radius of A in the » direction.This approx- imation results in a very simple form of D: D D I C1¿.½A I C ½B I C ½C I C 2½M I / j;k;l (10) for which the inversion is just a scalar operation. TheLU-SGS schemeproposedbyYoon andJameson24 is used for the time integration of the implicit equations. The resulting system of equations is solved by applying a two-step symmetric Gauss– Seidel algorithm as follows: L1q ¤ D ¡1¿ f .qn/; U1qn D D1q¤ (11) InEq. (11),1q¤ is the intermediatesolutionand1qn is the updateto the vector of dependent variables at time step n, qn C 1 D qn C1qn . Note that the approximationsare made only to the left-hand side of the system to improve the convergence to the steady-state solution; hence, they will not effect the accuracy of the scheme. Time Accuracy The implicitalgorithmimplementedinTURNS is usedfor steady- state as well as time-accurate solutions. For steady-state problems, a rst-order scheme in time is used, and the convergence to the steady state is acceleratedusing large time steps.On the other hand, for time-accurate problems, the equations are integrated through time using a second-orderscheme starting from a meaningful initial condition, usually the steady-state solution. In this case, the time step is chosen according to the timescale of the problem. LU-SGS has some factorizationerror associatedwith the approx- imations made on the left-hand side of the algorithm. This destroys the good convergencecharacteristicsin large time steps for steady- state solutions and puts a limit to the size of the time steps used. In addition, for time-accurate solutions, the factorization error still exists, but it can be reduced by using inner relaxation steps. For steady-state problems one can vary 1¿ in space, which can be interpreted as an attempt to use a uniform Courant–Friedrichs– Lewy number throughoutthe ow eld. The space-varyingtime step approach is especially effective for grids that have very ne to very coarse spacings in the domain. The Navier–Stokes grids used in this study are good examples of such wide variety length scale grids. In TURNS, a purely geometric variation of the time-step approach is used25;26 in the following form: 1¿ D 1¿ jref 1C " p J 1CpJ (12) to accelerate the steady-stateconvergence.In Eq. (12),1¿ jref is the xed reference time step, " is a small constant, for example, 0.002, 旋翼机空气动力学：www.tech-domain.com EKICI AND LYRINTZIS 889 and J is the Jacobian of the transformation.On the other hand, for unsteady solutions, the time step used in the calculations is a xed value. Boundary Conditions The boundaryconditionsare appliedexplicitly,whichmeans that the values of the conserved variables are updated at the end of each iteration. The Cartesian velocities u; v, and w are calculated from the extrapolationof the contravariantvelocities.On the surface, the tangency condition is imposed for the Euler calculations, whereas a no-slip condition is imposed for the Navier–Stokes calculations. These two conditionsrequire the contravariantvelocitiesW D 0 and U D V DW D 0, respectively. The density and the pressure on the surface are calculated by zeroth-order extrapolations, whereas the total energy is determined from the equation of state. Parallelization Previously, the parallelizationof the Euler mode of TURNS was done by Wissink11 in a domain decomposition fashion using MPI routines.The parallelizationof the code in multiple instructionmul- tiple data (MIMD) fashion was chosen for two main reasons. First, code rewriting from FORTRAN 77 to FORTRAN 90, which is re- quired by single instruction multiple data, is avoided. Second, the MIMDapproachmakes theprogrammoreportableto differentkinds of parallel architectures. A simple domain decomposition strategy, which uses the values from the previous iteration of the current time step on processor boundaries, is chosen. This strategy was used before by Wissink et al.,27 and no signi cant degradationof performancewas observed for up to 456 processors. The alternative, of course, is to do a for- mal domain decomposition as explained in Ref. 28. However, this approach is time consuming to implement in a legacy code, and because the rst, simple strategy has worked well previously, it was used in this study. The computational grid is decomposed to equally spaced parti- tions to prevent load imbalance, and a copy of the code is run on each processor.On the boundariesof each partition, an overlapping strategy is used to calculate the conserved variables. For example, the conserved variables on the left boundaries of each partition is calculated as the last interior points near the right boundary of the previous partition. Likewise, the conserved variables on the right boundaries of each partition is calculated as the rst interior points near the left boundary of the next partition. This single overlap strategy and a typical two-dimensional grid partitioning are given in Figs. 1 and 2, respectively. After all of the processors are done with their solutions, the val- ues on the boundaries are sent to and received from the neighboring processors using MPI SEND and MPI RECEIVE subroutines, and Fig. 1 Single overlap strategy. Fig. 2 Partitions of a typical two-dimensional grid. the calculationsfor the next iterationcan be started. For small num- ber of points per processor, we expect to see some performance degradation because the boundary values are less accurate. The partitioningof the grid is allowed only in the streamwise and radial directions,as shown in Fig. 3, to eliminate the load imbalance when applying the boundary conditions. In addition, the grid collapses to a singular plane at outboard the tip of the blade and across the wake. At these locations, to satisfy the continuityof the owquantities,valuesare interpolatedbetween the processorson each side of the singularplane. Partitioning in the normal directionwould make some of the processors sit idle while the otherswould perform the interpolation,which would cause load imbalance. Therefore, partitioning of the computational domain is done in the streamwise and radial directions only. Diagonal Operators In this section, two parallel implicit operators that use the spec- tral approximation of LU-SGS scheme previously implemented in TURNS are described.The useof spectralapproximationretains the ease of computation, as well as the advantage of using large time steps for the steady-state solution. LU-SGS Operator The LU-SGS algorithm is the basis of the parallel implicit oper- ators implemented in TURNS. In the two-step symmetric Gauss– Seidel method (see Ref. 11), there are off-diagonal terms coming from L and U on the left-hand side of both equations. If we move these off-diagonal terms to the right-handside, the resulting system becomes diagonal, and the updates of the conserved quantities can easily be obtained.The algorithmfor the LU-SGS implicit operator can be written as follows. Algorithm 1(LU-SGS): Do j; k; l D 1; : : : ; Jmax; Kmax; Lmax 1q ¤ j;k;l D D¡11¿ £¡ f .qn/C ACj ¡ 11q¤j ¡ 1 C BCk ¡ 11q¤k¡ 1 CCCl¡ 11q¤l ¡ 1 C Ml¡ 11q¤l ¡ 1 ¤ EndDo Do j; k; l D Jmax; Kmax; Lmax; : : : ; 1 1q n j;k;l D 1q¤j;k;l ¡ D¡11¿ £ A¡j C 11q n j C 1 C B¡k C 11qnk C 1 CC¡lC 11qnl C 1 ¡ MlC 11qnl C 1 ¤ EndDo 旋翼机空气动力学：www.tech-domain.com 890 EKICI AND LYRINTZIS Fig. 3 Partitioning strategy of the computational grid. Fig. 4 Schematic diagram of LU- SGS sweeps. In the rst step of the solution, the intermediatesolutionvector1q¤ is obtained by sweeps that start form the lower left corner and pro- ceed to the upper right corner of the computational grid. The nal solution is then obtained by sweeps in the reverse direction. The direction of sweeps in the computationaldomain is shown in Fig. 4. The LU-SGS method uses hyperplanes of different sizes through- out the grid. Although this approach can easily be vectorized, it is dif cult to implement the method on the distributed-memory computers because of the load balancing problems caused by the variable-size hyperplanes and the large amount of communication due to the recursion between planes. For example, if we look at the forward sweep, the point . j; k; l/ can begin the computation only when points . j ¡ 1; k; l/, . j; k¡ 1; l/, and . j; k; l ¡ 1/ have completed the step. Therefore, this data dependency causes load balancingproblem, and not all of the processors can be kept busy at the same time. Overall, the LU-SGS algorithmin its original form is not very suitable for distributed-memoryparallelization,and hence, some algorithmic changes should be made. DP-LUR Operator A very ef cient point-relaxationimplementation of the LU-SGS method for the data-parallel environment, DP-LUR, has been in- troduced by Candler et al.9 for solving hypersonic ow problems. Basically, this method replaces the two symmetric Gauss–Seidel sweeps with Jacobi sweeps and moves all of the off-diagonal terms to the right-handside. The procedure for this method can be written as follows. Algorithm 2 (DP-LUR): 1q .0/ j;k;l D ¡D¡1 ¢1¿ f .qn/ Do i D 1; : : : ; isweep Do j; k; l D 1; : : : ; Jmaxlocal ; Kmaxlocal ; Lmaxlocal 1q .i/ j;k;l D D¡11¿ £¡ f .qn/C ACj ¡ 11q .i ¡ 1/j ¡ 1 C BCk ¡ 11q.i ¡ 1/k¡ 1 CCCl¡ 11q .i ¡ 1/l ¡ 1 C Ml ¡ 11q.i ¡ 1/l ¡ 1 ¡ A¡j C 11q.i ¡ 1/j C 1 ¡ B¡kC 11q .i ¡ 1/k C 1 ¡ C¡l C 11q.i ¡ 1/l C 1 C Ml C 11q.i ¡ 1/l C 1 ¤ EndDo EndDo 1q n j;k;l D 1q.isweep/ DP-LUR uses only the nearest neighbor data due to the point- relaxation strategy and, therefore, allows simultaneous computa- tions on multiple processors with each processor holding the local data. The data that lie on the borders are communicated after each

本文档为【旋翼机空气动力学】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

旋翼机空气动力学

热门搜索

历史搜索