为了正常的体验网站,请在浏览器设置里面开启Javascript功能!
首页 > 晶体解析shelxd

晶体解析shelxd

2010-01-25 24页 pdf 70KB 107阅读

用户头像

is_046333

暂无简介

举报
晶体解析shelxd SHELXD and SHELXE The structure solution program SHELXD (called XM in the Bruker SHELXTL system, but identical to SHELXD except in the logo) is able to solve larger ab initio problems than SHELXS-97, and is also useful for locating the heavy atoms or anomalous scat...
晶体解析shelxd
SHELXD and SHELXE The structure solution program SHELXD (called XM in the Bruker SHELXTL system, but identical to SHELXD except in the logo) is able to solve larger ab initio problems than SHELXS-97, and is also useful for locating the heavy atoms or anomalous scatterers from SIR, SAD, SIRAS or MAD data. From January 2002 SHELXD is available as source and precompiled binaries for common operating system as part of the SHELX-97 system. XM is available from Bruker Nonius as part of the SHELXTL system, which includes the whole of SHELX plus programs not in the public domain such as the interactive molecular graphics program XP and reflection data manipulation program XPREP. In this documentation both XM and SHELXD will be referred to as SHELXD. For the MAD, SAD, SIR etc. applications of SHELXD the location of the heavy atom sites is only one step in the structure solution. The new program SHELXE (XE in the Bruker SHELXTL) can read the .res file containing the heavy atom sites written by SHELXD and estimate the native phases and the corresponding weights (figures of merit). SHELXE outputs the phases in an XtalView format .phs file so that a map can be viewed using interactive graphics or the phases can be improved by density modification using program such as DM, SOLOMON, RESOLVE etc. SHELXE is robust, fast and simple to use, but it must be emphasized that the resulting phases may be inferior to those produced by much more sophisticated maximum likelihood programs such as SHARP, SOLVE or MLPHARE. However in favorable cases it may even prove possible to autotrace the maps from SHELXE directly, e.g. using wARP. For SIR and SAD problems SHELXE starts with the centroid phases from the Harker construction (Harker, 1956); for MAD and SIRAS an unambiguous phase can be assigned. Sigma-A weights (Read, 1986) are used throughout. In the case of SAD and SIR a single density truncation cycle retaining only about 7% of the density is applied to resolve the twofold ambiguity for appropriate reflections; this is similar to the low density elimination used by Woolfson et al. ( ) and to a density modification procedure proposed by Giacovazzo & Siliqi (1997). The crude density modification performed by SHELXE may be termed the sphere of influence method. A sphere of radius 2.42Å (a typical 1,3-distance in virtually all organic and macromolecular structures) is constructed around each pixel of the map, and the variance of the electron density around a given pixel is calculated using 92 (or 272) optimally distributed pixels that lie close to this sphere. The variances are sorted but instead of using them to define a sharp solvent boundary, a fuzzy boundary is generated so that pixels with very high sphere variances will be entirely in the 'protein' region and those with very low variances will be entirely in the 'solvent' region and the ones in between are assigned probabilities between 0 and 100% that they are in the solvent region. In the protein region the negative density is reset to zero and in the solvent region it is 'flipped' (Abrahams & Leslie, 1996). A pixel that has been assigned a 60% probability of being in the solvent region is assigned a 60:40 weighted average of the densities resulting from the solvent and protein treatments. It was anticipated that by using a little chemical knowledge (the 1,3-distance) it would be possible to improve maps given very high resolution data, but in practice the method still works well with 3Å data provided that the solvent content is relatively high. For very high resolution data (better than 1.5Å) or very high solvent content (>60%) the SHELXE phases can have rather high map correlation coefficients (>0.9) with the phases from the final refinement. In less favorable cases it may well be possible to improve the phases further using other more sophisticated density modification programs, especially if non- crystallographic symmetry (NCS) can be exploited. An attempt is made to estimate realistic weights (foms) in SHELXE so that further phase refinement using other programs is facilitated. SHELXE is currently a beta-test that is being made available in precompiled form without extra license fees etc. but with an expiry date (1/1/03) to registered SHELX and SHELXTL users. If it proves successful it will be incorporated in future SHELX and SHELXTL releases that will have to be licensed separately. Introduction to SHELXD SHELXD is a stand-alone executable and does not require any other program, initialization files or environment variables etc. The input to SHELXD consists of two files, name.ins and name.hkl, both of which can conveniently be created using the Bruker Nonius XPREP program. The .hkl file has the standard SHELX format and with the exception of two or three instructions in the .ins file is very similar to the input for SHELXS. SHELXD expects ONE and only one source of starting atoms. This can take the form: A: Input atoms in normal SHELX format for expansion using PLOP B: PATS for Patterson seeding of the dual-space direct methods C: GROP and a PDB-format model for fragment seeding D: Random atoms (used if none of the above apply) For substructure solution using MAD data etc. option B (PATS + FIND but no PLOP) is recommended. In each case the action is specified in the .ins file that also contains crystal data in the usual SHELX form. The reflection data consists of an .hkl file containing F2 (HKLF 4) or F-values (HKLF 3). These may correspond to either native data for ab initio structure solution or structure expansion, or MAD, SAD, SIR or SIRAS FA or DF values for heavy or anomalous atom location. Dual-space recycling (Miller et al., 1993; Miller et al., 1994; Sheldrick et al., 2001), using the largest E-values (FIND) is followed by peaklist optimization (PLOP; Sheldrick & Gould, 1995); one or both of these commands must be present. In the case of structure expansion only PLOP can be used and the program then stops. When the starting atoms are generated randomly or by PATS or GROP, the calculations are repeated with new sets of starting atoms each time. The total number of such tries may be specified with NTRY, otherwise the program runs for ever (unless interrupted by a name.fin file). When the final correlation coefficient CC (after PLOP) for an atomic resolution ab initio run of SHELXD is 65% or greater, the structure is almost certainly solved. SHELXD writes the best solution so far to a SHELX format file name.res and a PDB format file name.pdb. The former can be examined with the interactive graphics program XP that is part of the Bruker SHELXTL system. If XP is not available the PDB file may be displayed with RASMOL (use the ball and stick display mode). Note that this may be done before stopping SHELXD. If the structure is clearly solved, SHELXD may be terminated cleanly by creating a file name.fin in the working directory. Examples of ab initio structure solution with SHELXD To illustrate full structure solution by ab initio methods, a test example is provided (in the egs subdirectory on the SHELX ftp site) in the form of the files pn1a.ins and pn1a.hkl. Four different ways of solving the structure are included in the .ins file; in order to run the various tests it will be necessary to comment out some lines (by putting a space character at the beginning of the line). The file is read only as far as the first HKLF instruction. This test structure was kindly provided by Jenny Martin, University of Queensland, Australia. It consists of (GCCSLPPCAANNPDYC), a linear polypeptide with two disulfide bridges, giving 110 non-hydrogen peptide atoms plus 12 solvent atoms. The space group is P21 and the resolution of the data 1.1Å. For further details see Hu et al. (1996). In the following examples, TITL...UNIT in the normal SHELX format is assumed at the start of the .ins file and HKLF 4 (or HKLF 3) followed by END at the end of the file. The cell contents defined by SFAC and UNIT are only used by PLOP; in the FIND stage the atoms are assumed to be of the same type but with occupancies proportional to the square root of the peak height, unless occupancy refinement is used (TANG with a negative first parameter). FIND 80 PLOP 120 140 160 NTRY 50 This will search (FIND) for 80 atoms in the dual-space stage; it is usually more efficient to search for ca. 25% less than the total number of non-solvent atoms, especially when - as here - some heavier atoms such as sulfur are present. In the PLOP stage on the other hand one should specify more than the expected number of atoms because this procedure involves the elimination of the 'wrong' atoms. One can leave NTRY out in which case the job will run forever (unless aborted or stopped more gently by creating a name.fin file in the same directory). An alternative approach is to use Patterson seeding instead of random starting atoms. One can then look for say 80 atoms as above with FIND, or alternatively first optimize the sulfur substructure (in this case four atoms) with FIND and expand to the full structure with PLOP. The Patterson seeding may be performed for example with a randomly oriented fixed length vector (for a disulfide bond). Everything after a '!' sign in a SHELX .ins file is treated as a comment. PATS -2.06 ! S-S distance PSMF -4 ! supersharp Patterson FIND 4 5 MIND -1.8 ! S-S > 1.8A, calc. PATFOM TEST 10 5 PLOP 50 80 120 160 160 NTRY 20 Alternatively the Patterson seeding may use the highest Patterson peaks as translation search vectors: PATS PSMF -4 FIND 4 5 MIND -1.8 TEST 10 5 PLOP 50 80 120 160 160 NTRY 20 Patterson or fragment seeding does not have to go through the FIND stage to optimize the atomic positions, though this is strongly recommended and has the advantage that all four sulfurs can be used. It is also possible to go into structure expansion with PLOP directly, and this facility can be tested using the two-atom disulfide fragment as follows. It should be noted that two sulfur atoms are quite adequate for PLOP to expand to the full structure, but the CC threshold (the first TEST parameter) for entering the PLOP stage needs to be reduced a little (in the above tests, it had the default of 45% for FIND 80 and was set to 10 for FIND 4). GROP TEST 8 5 PLOP 30 50 80 120 160 160 NTRY 20 ATOM 1 S CYS 1 0.000 0.000 0.000 1.000 10.00 ATOM 2 S CYS 1 0.000 0.000 2.060 1.000 10.00 The two sulfur atoms are given in fixed PDB fixed format. As a further example (not provided as test files) of seeding based on an initial fragment search, for a cyclodextrin structure with four beta-cyclodextrins in the asymmetric unit and with data barely to atomic resolution, the following could be tried: GROP FIND 240 PLOP 320 400 ATOM 1 C41 MOL 1 -3.859 4.863 7.904 1.000 10.00 ATOM 2 C31 MOL 1 -5.081 4.209 8.524 1.000 10.00 ATOM 3 C21 MOL 1 -5.211 2.740 8.155 1.000 10.00 ... diglucose fragment in PDB format ... . ATOM 21 C52 MOL 1 -0.292 4.714 7.025 1.000 10.00 ATOM 22 O52 MOL 1 -0.642 5.837 6.253 1.000 10.00 A major new facility in SHELXD for small molecules is the ability to solve merohedrally twinned structures by ab initio methods; all that is required is to input the SHELXL instructions TWIN and estimated BASF parameter (which is held at a fixed value throughout). XPREP can be used to find the TWIN matrix and estimate the BASF parameter value. TWIN and BASF are only applied at the PLOP stage, and are ignored by PATS, GROP and FIND. Macromolecular phasing using SHELXD and SHELXE SHELXE is intended to be run immediately after SHELXD. It picks up the .res file containing the best substructure solution (so far) from SHELXD. Since very few parameters are required for SHELXE they are all given on the command line. When the correlation coefficients indicate that SHELXD has 'solved' the substructure, it can be terminated (by writing a dummy name.fin file into the working directory - under UNIX the touch instruction can be used for this) and TWO SHELXE jobs started. Two jobs are almost always necessary because the heavy atom substructure and where appropriate the space group may have to be inverted; there is a 50% chance that the heavy atom enantiomorph will be wrong! The command lines for these two jobs are identical except that one contains the -i switch. These two jobs may be run simultaneously because the files do not clash; the -i job adds '_i' to the end of the first part of the filename for the output files. Often it will become clear from the console output which heavy atom enantiomorph is correct (see examples below) and the other job can be killed with . Before phasing with SHELXD and SHELXE it is necessary to prepare three input files: name-df.ins, name-df.hkl and name.hkl. The first two are read by SHELXD, the last two by SHELXE, which also reads the file name-df.res written by SHELXD. Up to the period, the filename can be freely chosen but must be the same for the first two files; see the examples below. All three files can conveniently be set up using the Bruker XPREP program, but the information below should enable other sources to be used. Note that Bruker Nonius are often willing to provide a free demo version of XPREP (fully featured but with an expiry date), anyone interested should contact sbyram@bruker-axs.com, trixie.wagner@bruker-axs.de or anita.coetzee@nonius.nl. The name-df.ins file contains (at least) the following instructions in the order given: TITL (followed by any title on the same line) CELL l a b c a b g (in Å and deg.: l is ignored but is standard for SHELX) LATT and SYMM (to define the space group, see examples and the SHELX manual) SFAC Se (or any other single element, even if there are several heavy atom types) UNIT M (approximate number of heavy atoms per cell multiplied by 4) SHEL 999 d (where d is the resolution at which to truncate the data) PATS (Patterson seeding) FIND N (number of sites to search for, should be within 20% for best results) MIND -3.5 (minimum allowed distance between sites) HKLF 3 (to read F rather than F2) END The critical parameters are d, the resolution at which to truncate the data, and N, the number of atoms to be searched for; it may be worth trying different values of these two parameters in difficult cases. The optimal value of d may be estimated using XPREP, either from the mean ratio of DF to its esd (assuming that the data have been processed so that the esds are on an absolute scale, i.e. c2 is close to one), or from the correlation coefficient between the signed anomalous differences for two datasets (different MAD wavelengths or in the case of SAD different crystals). It should be noted that there is almost always an optimal value of d and it should be larger than the resolution limit of the diffraction pattern. Often 3Å to 3.5Å gives good results for MAD phasing. If XPREP is not available then a good rule of thumb is to set d to 0.5Å less than the diffraction limit. At the end of the dual-space direct methods SHELXD refines the site occupancies assuming that all atoms are of the same type. This provides an adequate approximation in the case where different anomalous scatterers are present (e.g. Ca2+ and S in the trypsin example discussed below). It also shows when the actual number of sites is different from the value input on the FIND instruction; for a selenomethionine MAD experiment there should be a clear drop in occupancy after the last site. For halide soaks on the other hand there is often a continuous descent to the noise level reflecting the variable occupancies of the sites. The occupancy refinement is switched on by a negative first TANG parameter; this is the default if there is no PLOP instruction. The cell contents (SFAC/UNIT) should be specified correctly when SHELXD is used for full ab initio structure solution, but for substructures a single element type should be specified and the number of sites expected per cell multiplied by about four so that the probabilities are calculated correctly for the minimal function and Ralpha figures of merit. Since these are only printed as information - the correlation coefficient alone is used to decide which solution is 'best' - the SFAC/UNIT parameters are not important for substructure solution. For large selenomethionine substructures (which behave more like equal atom ab initio structure solution of small molecules) it may be worth increasing the number of Patterson peaks used for the Patterson seeding (e.g. PATS 200; the default is 100) and adding the instructions WEED 0.3 (random omit maps) and SKIP 0.5 (uranium atom removal). The latter two are the defaults when PLOP is present but are switched off by default if PLOP is absent. When PATS is used, WEED produces a much smaller additional improvement in the hit ratio than when PATS is absent. For small substructures (<10 sites), WEED and SKIP can do more harm than good by eliminating too many correct sites at once. The minus sign for the first MIND parameter specifies that the PATFOM figure of merit and crossword table should be calculated. For phasing using the anomalous scattering of sulfur, a distance of about 1.7Å is required if the resolution of the DF data (as truncated using SHEL) permits the sulfur atoms in disulfide bridges to be resolved from each other (see trypsin example below). The default option in the FIND stage of SHELXD is to ignore all sites on special positions; to include possible sites on special positions, set the second MIND parameter to -0.1. This can happen for halide soaks etc. but is not required for the two examples below (selenomethionine cannot lie on a special position, and there are no special positions in P212121). It may also be worth adding NTPR 100 or NTPR 1000, otherwise the SHELXD job will never finish. Alternatively NTPR can be left out and the job terminated by creating a name- df.fin file. The file name-df.hkl consists of one line per reflection, terminated by the end of the file or by a line with all numbers zero. It is read using the FORTRAN format 3I4,2F8.2,I4; as normal when reading floating point numbers with FORTRAN, the number of figures after the decimal point may be varied but the numbers must be contained within the 8 character fields and the decimal point must be present in the number. Each line consists of h, k, l, [DF or FA], [s(DF) or s(FA)] and a, where a is the estimated phase shift in degrees that has to be added to the heavy atom phase to give the native protein phase. DF or FA are always given as positive numbers. In the SIR case, a is zero if the derivative F is greater than the native F and 180 if the opposite is true; for SAD, a is 90 if F+ > F- and 270 if F+ < F- For MAD or SIRAS data, a may be anywhere in the range 0 to 360. a is only read by SHELXE, not by SHELXD. The file name.hkl contains h, k, l, F2 and s(F2) in format 3I4,2F8.2 for the native data and is terminated by the end of the file or by a line with all numbers zero. In a selenomethionine MAD experiment it could either be a remote wavelength or (as in the example below) it could be the data from the native (methionine) crystal if that diffracted to higher resolution. Usually the same data will be used for the final refinement of the structure. After starting SHELXD (with the command line shelxd name-df) the program first prints a summary of all parameters used, then calculates and stores the Patterson and the phase relations for the tangent formula. The solution with the best CC (correlation coefficient) so far is written to the name-df.res file. One should wait until there are one or more solutions with CC and CC(weak) at least 30 and 15 resp. and well separated from the rest, but in practice it is worth waiting a few minutes longer in case there is an even better solution. When it appears (from the CC v
/
本文档为【晶体解析shelxd】,请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑, 图片更改请在作品中右键图片并更换,文字修改请直接点击文字进行修改,也可以新增和删除文档中的内容。
[版权声明] 本站所有资料为用户分享产生,若发现您的权利被侵害,请联系客服邮件isharekefu@iask.cn,我们尽快处理。 本作品所展示的图片、画像、字体、音乐的版权可能需版权方额外授权,请谨慎使用。 网站提供的党政主题相关内容(国旗、国徽、党徽..)目的在于配合国家政策宣传,仅限个人学习分享使用,禁止用于任何广告和商用目的。

历史搜索

    清空历史搜索