Published as: G.J. Kleywegt and T.A. Jones, Model-building and refinement practice, Methods in Enzymology, 277, 208-230 (1997).

© Academic Press, 1997

Good Model-building and Refinement Practice

Gerard J. Kleywegt * and T. Alwyn Jones

Department of Molecular Biology,

Biomedical Centre,

Uppsala University,

Box 590,

S-751 24 Uppsala,

SWEDEN.

* Author to whom correspondence should be addressed.

Running title: Good Model-building and Refinement Practice

To appear in Methods in Enzymology, edited by R.M. Sweet and C.W. Carter Jr.

HTML Version produced by Erling Wikman, Uppsala

INTRODUCTION

An initial model built into an experimental map, or in a poorly phased molecular replacement map, will usually contain many errors. In order to produce an accurate model, it is necessary to carry out crystallographic refinement as well as rebuilding at the graphics display. These steps are carried out in a cyclic process of gradual improvement of the model. Depending on the size of the structure, the automatic (refinement) or the manual (rebuilding) part may be rate-limiting. Refinement programs change the model in order to improve the fit of observed and calculated structure-factor amplitudes. Many different refinement programs exist (see Kleywegt & Jones [1] for some history), and most contemporary programs use reciprocal-space methods. Due to the limited resolution typically obtained in biomacromolecular crystallography, the relatively scarce experimental data is augmented by chemical information, for instance concerning bond lengths and angles. Rebuilding the model at an interactive graphics workstation is necessary to remove errors that cannot be remedied by the refinement program. Such errors often require a major new interpretation of parts of the maps and, at present, can only be done by intelligent crystallographers at a 3D workstation.

Errors in structures come in different classes of gravity [1-3]. At its worst, all or part of a protein trace may be wrong. Methods have been developed to help identify such models; they make use of our knowledge of protein structure [4,5]. In less severe cases, the local main chain and side chains may be wrong. If we divide the regions making up the first model of a molecule into three classes, good, bad, and ugly, then the perfect refinement would result in a final model in which all atoms fall into a fourth category, excellent. Unfortunately, due primarily to lack of resolution, this situation rarely occurs and we must usually be satisfied with a model that lacks ugly regions, and contains a high percentage of excellent ones. In order to arrive at such a final model, both good refinement and good rebuilding practices are necessary.

In the past few years, software for refinement and rebuilding of crystallographically determined macromolecular structures has become ever more powerful and easy to use [6]. However, better software does not automatically lead to better models [1]. A powerful refinement program can be used to create a model which adequately explains the experimental observations, but it can just as easily be abused to create a model which contains errors and artefacts introduced by fitting the model to errors in the experimental data. The difference between a carefully refined model and an over-fitted model, as measured by the root-mean-square (RMS) distance between corresponding CA atoms, may well exceed 1 Å [1,7,8]. A good model is one which makes sense (e.g., with respect to stereo-chemistry, temperature factors, etc.), adequately explains the experimental data, and uses the smallest number of parameters to achieve this. Good model building and refinement practice aims to produce such a model. It requires the use of appropriate techniques and strategies during both the (re)building and the refinement stages.

Model refinement has been a personalised affair for which laboratories have their preferred strategies, programs, etc. This has resulted in models with distinctive features of both the groups concerned and the software used. In this paper we propose our own views on how a macromolecule should be refined, and argue that the present practices in the community are often far from optimal, especially in cases where only low-resolution data is available [1].

All refinement programs nowadays use empirical restraints or constraints to ensure that a reasonable structure ensues during the refinement steps. This can result in a model with good stereo-chemical properties, but also a model in which molecules related by non-crystallographic symmetry (NCS) are forced to have similar (restrained) or identical (constrained) conformations. Nevertheless, unless special precautions are taken, over-fitting the data (i.e., adjusting the model in a manner which is not warranted by the quantity and/or quality of the experimental data) is almost guaranteed to take place, resulting in a model with a low R-factor, but concomitantly low accuracy. Because of the limited resolution of the diffraction data in a typical macromolecular crystallographic study, the number of parameters in the model is often similar to, or even smaller than, the number of experimental observations, which renders the risk of over-fitting extremely high. "Popular" methods to push the conventional R-factor down include ignoring NCS (otherwise the single most powerful method to reduce the number of degrees of freedom in the model), refining individual temperature factors and modelling alternative conformations at resolutions where this is not warranted, removal of data (using resolution and F/sigma(F) cut-offs) and inclusion of spurious entities (such as solvent molecules). These methods either reduce the number of experimental observations or increase the number of parameters in the model, and therefore invite the refinement program to fit error terms, and sometimes this over-fitting may even mask gross errors in the model [7,8].

In rebuilding, the experimental map (if available) should always be kept, and at each stage one should try to re-interpret it in the light of the current model, and using the current 2Fo-Fc, Fo-Fc and other maps. One should keep in mind the kind of errors that might still be present in the model, and try to locate places in the map that could be the result of such errors. While rebuilding a model, the accumulated knowledge concerning macromolecular structures should be used to locate places in the current model that deviate from our expectations and previous experience (as pertaining to quality of the fit to the map, stereo-chemistry, preferred conformations and environments of residues); such deviations could indicate local errors.

The aim of model-building and refinement should be to construct a model which adequately explains the experimental observations, while making physical, chemical and biological sense. It is a fact of life that low-resolution data can only yield low-resolution models. The refinement process, in particular, should therefore always be tailored for each problem individually, keeping in mind the amount, resolution and quality of the data. This means that at low resolution individual temperature factors and occupancies should not be refined, and that NCS-related copies of the molecule(s) have to be assumed to be identical. Even though it is evident that NCS-related molecules will usually display small differences, one simply does not have the data to prove this at low resolution. A model which enforces strict NCS may be less precise (e.g., with respect to surface side chain details), but due to the much larger ratio of experimental observations to adjustable model parameters, it is likely to be a more accurate description. The distinction between precision and accuracy is an important one, but unfortunately the two concepts are often confused. Precision is related to level of detail, accuracy to how close to the "truth" something is. For instance, the number 4.987453637 is a very precise, but not very accurate approximation of the number PI; the number 3.14, on the other hand, is a not very precise, but much more accurate estimate. In the case of multiple protein models derived by solution nuclear magnetic resonance (NMR) techniques, a tight clustering means that the structures are precise, but it doesn't say anything about whether or not they are close to the real structure [9,10]. In the case of protein crystallography, a 3 Å structure with individual temperature factors, unrestrained NCS and hundreds of water molecules built in may seem very precise, it is doubtful whether the atoms on average are even within 1 Å from their actual positions. Similarly, a hydrogen-bonding distance reported as 2.83 Å for a 3 Å structure is quite precise, but not necessarily accurate. Low-resolution data precludes the production of a precise model; it does however not hamper one to produce a model which adequately describes the data. In other words, even at low resolution one can build accurate models as well as the data allows, but only high-resolution data may yield a model which is both accurate and precise. Even high-resolution data alone is no guarantee for an accurate model [1,8]; in addition one must use sensible refinement and rebuilding procedures, and monitor the quality of the model throughout.

Refinement should always start with a small number of degrees of freedom (the "null-hypothesis"). This means, for example, that if NCS is present, it should initially be constrained; the addition of water molecules should be postponed until the protein model is essentially complete and correct; the initial temperature-factor model should be conservative (e.g., by refining only one or two temperature factors per residue).

The model that results from a refinement round should be checked carefully against our expectations based on what we have learned about macromolecules ("quality control"). This process should not be carried out once, prior to writing the paper, but in every cycle. In that way, local errors can be detected more easily, and remedied as they occur. Standard checks should include the ideality of the geometry, the geometry of the main chain (Ramachandran plot), the adherence of the structure to database structures (e.g., the side chain conformations and the peptide oxygen orientations), and of course the fit of model and map (e.g., the component-based real-space R-factor). In addition, one should verify that the structure satisfies various common-sense rules of thumb (such as "NCS-related molecules are very similar", "bonded atoms and NCS-related atoms have similar temperature factors", "peptide bonds are planar", "arginine residues are involved in salt links or hydrogen bonds", etc.). Also, when one builds a new bit of structure, databases can and should be used to construct the main chain, and side chains should be added in one of their preferred rotamer conformations. While rebuilding, high-resolution data may and will reveal a few places in the structure where a side chain does not have a rotamer conformation, or where the peptide oxygen points in a different direction than would be expected on the basis of a comparison with a database. These may be regions of potential biological interest, but only if the crystallographic data permits this should liberties be taken with the model.

In the case of higher resolution data, the number of degrees of freedom can be increased gradually (and with restraint(s)) once the constrained refinement has converged. This means that one may use NCS restraints, instead of constraints, add water molecules, refine individual (but restrained) temperature factors, etc. The most useful indicator available today of whether or not the inclusion of additional degrees of freedom actually leads to a better model (i.e., a better description of the experimental data), is Brünger's free R-factor [11-14], the value of which is highly correlated with the mean absolute phase error of the model.

Figure 1

Figure 1. Overview of the structure-building process for crystallographically determined macromolecular structures. The items inside the boxed area are collectively known as one macro-cycle.

Figure 1 shows a schematic view of the refinement and rebuilding process (with NCS present). The "loop" inside the boxed area we refer to as a macro-cycle (i.e., one cycle of map calculations, quality control, rebuilding and refinement). Ten years ago, one would typically go through dozens of such cycles to produce the "final" model. In our experience, good models can now be produced in a fraction of the time (typically, 5 to 10 macrocycles). For small proteins (up to ~200 residues) at a typical resolution of 2-2.5 Å, one macro-cycle need not take more than one day if adequate computer facilities (access to a 3D graphics workstation and a fast number-crunching workstation) are available. This means that within one to two weeks after the initial model became available, either built in an multiple isomorphous replacement (MIR) map or the result of solving a molecular replacement (MR) problem, the structure can be completely refined. Of course, in case of larger structures or considerably higher resolution more time will be required to arrive at the final model.

In the following discussion, use of X-PLOR [15] and O [16] for refinement and rebuilding, respectively, is assumed. Both programs provide powerful tools, available today, to help build good models. We shall discuss:

the initial model and the issue of quality of the crystallographic data.
various practical aspects with respect to structure refinement, based on experience obtained in the refinement of a dozen structures with Simulated Annealing (SA) protocols and use of the free R-factor in the past two years.
a battery of quality tests to which we subject every intermediate model that is produced by the refinement program prior to rebuilding.
the type of electron-density maps that can be used during model rebuilding.
the type of problems one typically encounters during rebuilding, and how to try and remedy these problems.
some suggestions as to how to judge the quality of a "final" (i.e., published) model, both with and without the availability of atomic coordinates.

INITIAL MODEL AND DATA QUALITY

Elsewhere in this volume [3] the process of building an initial model into an MIR map using the tools available in O [16] is described. These tools include:

the Trace and Bones commands to edit skeletonised electron density [17];
the Baton command to build the initial CA trace [17]; and/or
the Lego commands to generate the initial (partial) model using a database of well-refined structures to generate the main-chain and side-chain atoms [18].

In the case of an MR problem, one will often need to "mutate" some of the residues (with the Mutate commands in O), build loops with insertions or deletions (using the Baton and Lego commands), and select different side-chain conformations. Since this is mostly a rebuilding problem, most of the tools discussed in the section about this topic can also be used to generate an initial model for refinement.

The success of the refinement, and, ultimately, the quality of the final model is critically dependent upon the quality of the crystallographic data. While processing, scaling and merging the data, one should keep the following in mind:

completeness of the data (both overall and in the highest resolution shell);
multiplicity of the data (overall and in the highest resolution shell);
signal-to-noise ratio in the data (e.g., measured by the average value of I/sigma(I) or by the percentage of reflections which has I > 2 sigma(I), both overall and in the highest resolution shell);
are the sigmas a realistic estimate of the true standard deviations ? This can be assessed from the output of SCALEPACK [19] or the CCP4 program AGROVATA [20].

Unfortunately, there is no agreed standard in the literature for presenting the quality of the diffraction data and, hence, the effective resolution of the study [21]. Bart Hazes has suggested [personal communication, 1995] to use an "effective resolution", defined as the resolution at which the actual number of observed, unique reflections would have constituted a 100 % complete dataset. In more than one case, we have found the calculation of this number a sobering experience.

To decide at which shell to cut off the resolution, we nowadays tend to use the following criteria for the highest shell: completeness > 80 %, multiplicity > 2, more than 60 % of the reflections with I > 3 sigma(I), and Rmerge < 40 %. In our opinion, it is better to have a good 1.8 Å structure, than a poor 1.637 Å structure. Moreover, over-estimating the resolution of the data is bound to lead to problems with the refinement later (for an example of this, see the Rfree section). In the case of complexes which are isomorphous to a previously solved structure, proper data processing may make the difference between observing fragmented blobs of density or nice well-connected density for the ligand, substrate or inhibitor. A recent data-processing error in our laboratory illustrates this. Cellobiohydrolase I [22] (CBHI) was crystallised in the presence of a beta-blocker in the hopes of obtaining a complex between the two [23]. Crystals were isomorphous to those obtained earlier, and data were collected on our new in-house R-AXIS IIc image plate system, and processed with the R-AXIS software [24] to a resolution of 1.8 Å. It was noted that the data was rather incomplete in the medium resolution shells. Nevertheless, SA refinement using Rfree was carried out and the refinement progressed well. Unfortunately, in the active site only isolated blobs of density were found and this left at least four possibilities for fitting the ligand. SA refinement did not yield better density for any of these possibilities. The original image plate data were then more carefully and more critically re-processed with DENZO [19] and scaled and merged with programs in the CCP4 package [20]. Reprocessing increased the overall completeness from 78 % to 99 %, and the completeness in the shells between 7.5 and 2.5 Å from ~65-70 % to ~96-100 %. Most important, however, was the appearance of beautiful, well-connected density in the active site. Unfortunately, the density showed unambiguously that not the beta-blocker, but rather beta-octylglucoside (part of the crystallisation solution) had been bound by the protein. In the case of complexes, high quality data is very important since one basically wants to obtain structural information about a small molecule using protein-crystallographic techniques. Another example of this is discussed in the section about maps.

REFINEMENT

In this discussion, we will focuss on two aspects. The first is the use of the free R-factor to (a) monitor the success or failure of a refinement step, (b) test alternative hypotheses (e.g., when deciding on the most appropriate temperature-factor model), and (c) to detect gross errors. The second aspect is the use of SA refinement, with which we have had very good experiences. Subsequently, we will briefly discuss force fields and dictionaries and several other issues related to macromolecular structure refinement.

Rfree. Recently, Brünger introduced a cross-validation scheme based on the so-called free R-factor, or Rfree [11-14]. The idea is to set aside a small fraction of the data (the "test set") which is not used in the refinement, but for which an R-factor is nevertheless calculated all the time. Comparing the values of the conventional and free R-factor tells something about the extent to which one has over-fitted the data as well as about the quality of model and data. The refinement program will use any degree of freedom it is given to reduce the discrepancy between observed and calculated structure factors. However, since the data is afflicted by error, and since the model is not an exhaustive description of all scattering matter in the crystal (space- and time-averaged), this easily leads to a situation in which the errors are compensated by erroneous changes to the model. Even today, many people have a fixation on low R-factors; only slowly is it beginning to dawn that conventional R-factors can be made arbitrarily low by including more and more degrees of freedom into the model [1]. In the case of photo-active yellow protein, the grossly incorrect initial model of this protein [25] was said to result in part from the power of SA refinement in reducing the conventional R-factor without actually improving the model [26]. In fact, plenty of structures have been refined in the past with a model that contained more adjustable parameters than there were experimental observations [27]. If one uses Rfree to monitor the refinement, however, such over-fitting of the data can be detected: if a refinement step in which more parameters are adjusted than in the previous cycle does not lead to a significant drop in Rfree, one has over-modelled the data.

A particularly striking demonstration of the poor performance of the conventional R-factor as an indicator of the correctness of a model (at least at low resolution) was recently given by intentionally tracing the structure of cellular retinoic-acid-binding protein (CRABP) type II (previously solved at 1.8 Å resolution [28]) backward, and refining this "model" using data to only 3 Å resolution [1,8]. Using an "established" refinement protocol, the conventional R-factor came down as low as 0.214, and this model had what is usually termed "excellent stereo-chemistry". The free R-factor, on the other hand, could not be fooled; it ended up at a value of 0.617, slightly worse than the value expected for a random set of scatterers. A consequence of the fact that the conventional R-factor is not correlated with the accuracy of a model (unless the data-to-parameter ratio is high) is that coordinate error estimates derived from conventional Luzzati plots [29] are meaningless. We therefore proposed that this quantity be estimated from an Rfree Luzzati plot [28] instead. In the case of the backward-traced CRABP model, the estimated coordinate error based on a Luzzati plot using the conventional R-factor is ~0.35 Å, whereas that based on the free R-factor is "infinite", which, at least in spirit, is more accurate. There are other indications that, at least at low resolution, an Rfree -based Luzzati plot gives a more accurate estimate of coordinate error than one based on conventional R-factors [11].

The free R-factor can be used to tune the refinement protocol for each individual case. For example, to find out if refinement of individual temperature factors is warranted, one can do the refinement both with grouped and with individual temperature factors. If the model with individual temperature factors does not have a considerably lower Rfree, one can conclude that with the current model and the present dataset, temperature factors are best modelled by group. We have done this experiment in the case of alpha-2u-globulin [30] (A2U), a structure with four-fold NCS for which we had collected a 2.5 Å dataset. Using a near-final model, grouped temperature-factor refinement (GTFR) yielded an Rfree of 0.272, and individual temperature factor refinement an Rfree of 0.275. On the other hand, in the case of cellular retinol-binding protein [31], GTFR using 2.1 Å data yielded an Rfree of 0.256, and individual temperature factor refinement an Rfree of 0.248. In fact, what we do here is to use Rfree to test the validity of various alternative hypotheses regarding a model. Another example of this involves the structure of P2 myelin protein [31,32] which has three molecules in the asymmetric unit. The hypothesis was that one of the three molecules in the asymmetric unit (molecule "C") has a higher overall temperature factor than the other two. To test the hypothesis, the structure was re-refined with strict NCS and grouped temperature factors [T.A.J., unpublished results, 1995] and, subsequently, an overall temperature factor shift was refined for each of the three molecules. The refinement of only three extra parameters resulted in a drop in both R (from 0.266 to 0.260) and Rfree (from 0.317 to 0.309; no ligand or water molecules were included at this stage). The temperature factor shift was -2.2 Å2 for molecule "A", -4.2 Å2 for molecule "B" and +10.6 Å2 for molecule "C", indicating that the hypothesis was appropriate.

Analogously, one can check if replacing NCS constraints by restraints yields a significantly better model for the data. This test we have also carried out with A2U, the results of which are shown in Table I. Clearly, Rfree indicates that the data at this resolution is best modelled by assuming the four monomers to be identical. Note that if no NCS restraints at all are used, the RMS difference between the monomers goes up to more than 1 Å, a value often seen for low-resolution structures which have been refined without the use of NCS or guidance by Rfree [1]. Also note that even this model has "excellent stereo-chemistry" as judged by the deviations from ideal bond lengths and angles (which are often the only "quality indicators" included in papers, in particular -though not exclusively- in the more prestigious journals ...). However, since the model contains four times as many adjustable parameters and still leads to an increase in Rfree, this is a clear-cut case of over-fitting: the unrestrained model is not a good description of the data.

Figure 2

Figure 2. Example of the behaviour of R and Rfree during an unsuccessful refinement cycle. The example is the result of one of many SA protocols tried out while refining the complex of human immunoglobulin IgG and the C2 domain of protein G [A.E. Eriksson, G.J. Kleywegt, M. Uhlén, and T.A. Jones, Structure 3, 265 (1995)]. The refinement included rigid-body refinement, energy minimisation, a slow cool from 4,000 K, and more energy minimisation. The solid line shows the behaviour of Rfree; the dotted line that of the conventional R-factor. The RMS difference between both R-factors was 0.063 and their correlation coefficient only 0.274. Refer to the text for details.

If only low-resolution data is available, SA refinement is often not successful. Use of Rfree has helped us in one case to probe the limits of SA refinement. When we refined the structure of the complex between the Fc fragment of human immunoglobulin IgG and the C2 domain of protein G [33], we only had a dataset available with an effective resolution of ~3.5 Å. Using Rfree as a guide, we found that none of the many SA protocols we tried yielded an improved model: Rfree remained constant or even increased, even though the conventional R-factor easily dropped by 0.1 (see Figure 2 for an example). There is no general rule as to which resolution is the limit for SA refinement; for every structure, viability of SA refinement should be investigated by inspection of the behaviour of Rfree. This point is driven home by the refinement of T. reesei endoglucanase I [34] (EGI), initially at 4.0 Å. This structure was solved with MR techniques. Not surprisingly, initial attempts to apply SA refinement to a rough homology model failed miserably (Rfree refused to drop below 0.50). Then a map was calculated using a poly-alanine model of one of the probe molecules. This map was poor, but after 15 cycles of two-fold NCS averaging a spectacularly improved map was obtained. Using this map, ~75 % of the sequence could be assigned to the model, yielding a starting value of ~0.45 for both R and Rfree. After a 4,000 K Cartesian slow cool, Rfree had dropped to 0.39 (R to 0.28), and in the resulting averaged map another ~15 % of the model (60 residues) could be traced and built, indicating that the SA refinement had genuinely improved the model.

Figure 3

Figure 3. Over-fitting data monitored by the behaviour of the conventional and free R-factor during refinement of the 2.9 Å structure of holo CRABP type I without the use of NCS and with individual isotropic temperature factors [G.J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo, and T.A. Jones, Structure 2, 1241 (1994)]. The large drop in the conventional R-factor leads to a poorer model and does not model the data better than the conservative model, since Rfree remains constant.

In yet another case, Rfree helped us to identify a problem with a dataset. While refining the structure of holo CRABP type I [28], we used a dataset that had been processed to 2.5 Å resolution. However, the refinement got stuck at an Rfree value of ~0.35, no matter what we tried. Since we trusted the model more than the data, we re-examined the original image plate data. It then turned out that the resolution limit of the data had been grossly over-estimated; more careful reprocessing yielded a dataset with a nominal resolution of only 2.9 Å, with relatively weak and incomplete data in the highest shells (effective resolution ~3.2 Å). However, using the two-fold NCS and by careful refinement the model readily refined against the reprocessed dataset. When we submitted the paper, one of the referees wondered if a structure with an R-factor of ~0.25 constituted a refined model (referring to the "25 % R-factor threshold" suggested by Brändén and Jones [2]). The same referee also suggested that the two molecules in the asymmetric unit might be different. Our initial response was that the R-factor was a result of proper refinement, rather than over-refinement, and that releasing the NCS would definitely yield different molecules, but not a more accurate model. In order to test this, we subjected our final model to two subsequent high-temperature SA cycles. In the spirit of the referee's comments, we did not employ the NCS, refined individual temperature factors, used a more restricted low-resolution cut-off and used full weight for the crystallographic pseudo-energy term. The "progress" of the refinement is shown in Figure 3, and the results are listed in Table II and are in complete agreement with our expectations. The conventional R-factor could easily be brought down to a more "traditionally observed" level, but Rfree did not decrease at all. This indicates that our conservative model adequately explains the data and that all additional model parameters introduce nothing but artefacts. Of course in reality the two molecules will be slightly different, but at 2.9 Å resolution a structure which assumes them to be identical is superior to one which tries to model the differences. The "observed" differences between the two molecules at this resolution are a direct result of noise-fitting (i.e., "modelling" the errors). This example also stresses the fact that the "25 % R-factor threshold" [2] should be rephrased in terms of Rfree. However, this will have to wait until more (correct and wrong) structures are available which have been refined with Rfree (we estimate that the Rfree threshold will be ~0.35). Analysis of the Protein Data Bank [35] (PDB) in May 1995 showed that only 62 (out of almost 3,000) X-ray structures included a free R-factor. The conventional and free R-factors of these structures are shown as a function of resolution in Figure 4. In the resolution range between 1.5 and ~3 Å, the conventional R-factors are more or less identical for all structures (~0.20), whereas the free R-factors (and, hence, the difference between the two) increase almost linearly with resolution.

Figure 4a

Figure 4b

Figure 4. Plot of the distribution of conventional (a) and free R-factors (b) for the 62 X-ray structures in the May 1995 version of the PDB [F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer, M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, J. Mol. Biol. 112, 535 (1977)] for which both values were reported.

Rfree can also be of help in deciding on a proper weight to use for the crystallographic pseudo-energy term [13]. The value calculated by X-PLOR in a so-called CHECK run tends to be too high (i.e., it weighs the X-ray term too heavily, leading to over-fitting and poorer geometry). Again, there is no ideal value, but running identical SA jobs with different scales for this weight is a good method of finding a proper weight. We tend to run three slow cool SA calculations from 4,000 K for every new protein structure we start to refine, one with full weight, one with half weight and one with 1/3 weight. The scale factor which yields the lowest Rfree is then used in subsequent refinement rounds.

We have noticed that some people don't like using Rfree or even claim that "it doesn't work with 3 Å data". This is understandable since people are used to getting cosmetically pleasing (but not necessarily meaningful) low R-factors, but it is also nonsense. It is exactly at low resolution that Rfree is most useful: at low resolution one has relatively little data (often also weak and incomplete) so that one is very close to a data-to-parameter ratio of one (and often below one). In these cases, the danger of over-fitting is obviously at its greatest, and this may even lead to masking of gross errors in the model [7]. We have also noticed that some people include Rfree calculations, but fail to "listen" to what Rfree is telling them. These are the cases where R and Rfree values are reported which differ by 0.1-0.15 (this has even lead to the claim that "Rfree is not a sensitive indicator of model quality"). Since the reflections used for the calculation of Rfree are not used in the refinement, one always expects to get lower conventional than free R-factors. However, the hallmark of good diffraction data and careful refinement is a small difference between the two. If the data is of high quality, and if the structure adequately models the data, then the structure factors used in refinement must be highly correlated to those not used, i.e. R and Rfree must have similar values. In our experience, with very good datasets we are able to obtain differences between R and Rfree of ~0.02; if the data is of poorer quality or if the resolution is very low, the difference may be as high as 0.05-0.08. In general, large differences indicate over-fitting or poor quality of the data. Naturally, these two phenomena are somewhat correlated: the poorer the data, the larger the errors in the observed structure-factor amplitudes, and the more room there is for the refinement program to fit these errors.

Another way of putting this: the conventional R-factor is extremely sensitive to changes in the precision of the model (the level of detail in which the model is described, i.e. the number of parameters in the model). Rfree, on the other hand, is a good measure for the accuracy of the model, i.e. the extent to which the model adequately explains the experimental observations. Over-fitting, then, is extending the precision of the model without improving its accuracy ("modelling the noise"). The "best" model (most adequate explanation of the data) is the one with the lowest possible value of Rfree (highest attainable accuracy; smallest phase errors), and the smallest possible difference between R and Rfree (the level of precision is warranted by the accuracy).

From the previous discussion it will be clear that we do not recommend so-called a posteriori evaluation of Rfree (i.e., performing one slow cool SA calculation with the final model, just to obtain a value for Rfree). Although a 4,000 K SA run will yield a value of Rfree close to that obtained if one had used Rfree throughout the refinement (see below), the resulting value of Rfree tends to be used only to complete a table in a publication, and not to evaluate the refinement strategy, or to assess the accuracy and the degree of over-fitting. The free R-factor should be used every step of the way as the single most important indicator available today of model accuracy and model improvement.

There are still several "undecided" issues with respect to the free R-factor (see also the discussions in Brünger [11] and Dodson et al. [36]). It is clear that in the case of NCS most of the reflections in the test set will be related to some of the reflections in the work set (unless special care is taken in the selection of the test set, e.g. by selecting them in thin resolution shells [1], or as small spheres or cones related by the G-function [P. Metcalf, personal communication, 1995; R.Read, personal communication, 1995]. This means that, in the worst case, serious errors and over-fitting could go undetected. In the best case, NCS may cosmetically reduce the difference between R and Rfree (on the other hand, the effect is probably small, and it also occurs when no NCS is present, due to the fact that bulk solvent introduces relations between neighbouring reflections). For example, the structure of the MS2-RNA complex [37], with ten-fold NCS, has an R-factor of 0.192 and an Rfree value of 0.209. The fact that the difference between R and Rfree is influenced by the degree of over-fitting, the presence and extent of NCS, and the quality of the data makes it difficult to give precise guidelines for acceptable magnitudes of this difference, but if the difference is large one should be cautious. In order to investigate if the relations between the reflections could be strong enough to make gross tracing errors indetectable, we have carried out some experiments with A2U since this structure has fourfold NCS. Again, we intentionally traced the structure backwards and refined it against 6-3 Å data using different NCS models and different ways to select the test reflections; the results are shown in Table III. Although the free R-factor did not go up to the level it attained in the case of the backward-traced CRABP type II structure (0.62), it was never lower than 0.46 (with constrained NCS; the maximum value was 0.55 with unrestrained NCS). Moreover, in agreement with the discussion about the difference between R and Rfree above, if the NCS was constrained or restrained, even the conventional R-factor could not be reduced to a satisfactory level (~0.35), whereas if the NCS was not restrained the R-factor easily dropped to a near-respectable level of ~0.27. Probably, in the latter case, inclusion of water molecules and further refinement (as was done in the case of the backward-traced CRABP type II structure) could have brought the conventional R-factor below the 25% threshold.

Spacegroup errors are unlikely to be detected by Rfree, since most of the test set reflections will have a symmetry-related reflection in the work set. Indeed, in the case of the original chloromuconate cycloisomerase structure (which was solved in spacegroup I4, but actually belongs to spacegroup I422 [7,38]), a posteriori Rfree calculations with different starting temperatures yield free R-factor values of 0.32-0.34, virtually independent of the starting temperature of the SA calculation.

On the other hand, the free R-factor does seem to be able to pick up largely or completely wrong structures [1]. For example, when the backwards-traced CRABP type II structure is refined with all data, and then subjected to a posteriori Rfree calculations, even a starting temperature of 500 K yields an Rfree value of 0.45. On the other hand, one needs to start the SA calculation at 4,000 K in order to approach the "real" Rfree value of 0.62. Other important unresolved issues with respect to the free R-factor include:

the question as to what constitutes a "significant drop" in Rfree, i.e. the precision of Rfree. Studies by Brünger [11] show that the precision depends mainly on the size of the test set, and ranges from ~0.5 % at high resolution to ~3 % at low resolution. If an improvement of Rfree is of the same magnitude as the precision, one could carry out several calculations with different test sets to see if the reduction in Rfree is a reproducible effect. For instance, when deciding on the most appropriate temperature factor model, one could reset all temperature factors to an average value and then refine them in different ways and using different test sets. This will of course consume a lot of computer time; if one wants to avoid that, our advice is to err on the side of caution.
the necessary fraction, minimum and maximum number of reflections one should include in the test set. It appears that a minimum of ~500 reflections is required to get meaningful statistics [11]; on the other hand, having a test set of more than ~2,000 reflections will probably not influence the results noticeably.
the question whether or not test set reflections should be used in map calculations. Strictly speaking, this would introduce bias towards the test set, but if one uses SA refinement in every cycle most of this bias is likely to be removed again. We prefer to use all reflections in map calculations for a very pragmatic reason: we attempt to build the best possible model. Also, using all reflections may reduce some of the opposition against Rfree from people who claim that they can't afford to miss a small fraction of their reflections.

As an alternative to the use of Rfree, one could contemplate to use the statistical significance tests on the weighted R-factor, as described by Hamilton [39]. His idea was to test if a decrease of the R-factor is significant (often confused with "large") given the number of observations and the increase in the number of adjustable parameters in the model. However, this test has been used only once in protein crystallography as far as we know [V. Lamzin, personal communication, 1995]. One obvious difficulty lies in the fact that structures are sometimes refined with more parameters than reflections; another is the question as to how one should weight restraints relative to the diffraction data.

One has to realise that the free R-factor is a global statistic, i.e. it relates to the mean absolute phase error [13]. This means that local errors in a model may well go undetected if only Rfree is used to assess the correctness of a model. In particular, serious out-of-register errors will be hard to detect since, usually, only a fairly small number of scatterers is involved. In the process of refining the structure of EGI [34], we detected and corrected such an error, involving a stretch of 20 residues which were out-of-register with the density by 7 residues (an insertion had not been correctly accounted for). At that stage, Rfree was already at 0.302 (R 0.245), i.e. well below our usual skepticism threshold of 0.35. In general, out-of-register errors can only be detected by a combination of experience, (SA-omit) maps, common sense (regarding the environments of residues), temperature factors, databases (to detect unusual main-chain arrangements) and alignment with homologous structures (if any are available). One should always be on the look-out for such errors. When doubting the correctness of a part of the trace, suspect residues can be temporarily omitted from the model, or cut back to alanines.

SA. Simulated Annealing refinement [40-43] is a powerful method which we use in almost every refinement cycle. The major benefit of SA refinement lies in its large radius of convergence. If the model and the data are good, an SA calculation rarely does any harm to the structure. On the contrary, we find that if an SA calculation does lead to a poorer model (in terms of Rfree or map quality), this is a strong indication that something is wrong with the input model, or that there is a problem with the data.

Figure 5

Figure 5. Averaged difference density for Candida antarctica lipase B in complex with Tween-80 after refinement of the structure without SA [J. Uppenberg, N. Öhrner, M. Norin, K. Hult, G.J. Kleywegt, S. Patkar, V. Waagen, T. Anthonsen, and T.A. Jones, Biochemistry, in the press]. Note that the density for the Tween-80 molecule is virtually uninterpretable. Refer to the text for details. (Figure kindly provided by Dr. J. Uppenberg.)

A good example is the structure of a complex of Candida antarctica lipase B with a non-ionic detergent called Tween-80 [44] (a long, floppy and chemically ill-characterised compound). The protein structure had been solved at 1.55 Å resolution [45], but for the Tween-80 complex only 2.5 Å data was available. The protein structure in the complex was solved by Molecular Replacement and refined using strict two-fold NCS. SA had not been used for fear of ruining a well-refined structure by exposing it to low-resolution data. This had yielded a final model with a very low value for Rfree of 0.209 (R 0.187). Unfortunately, there was relatively poor density in the presumed location of the Tween-80 molecule, even after electron-density averaging, which made it impossible to build a model for the Tween-80 molecule (see Figure 5). We then carried out a slow cool from 4,000 K, using weak harmonic restraints on all atomic positions (with a force constant of 20 kCal mol-1 Å-2 for CA atoms, and 10 kCal mol-1 Å-2 for all other atoms). The rationale for this was to force the protein back into a structure very similar to the highly-refined model, while leaving some room for the structure to relax (i.e., to adjust its conformation to the new data), in the hope of getting better density for the Tween-80 molecule (which was, after all, the interesting part of the structure). The slow cool was very successful and converged back to a model very similar to the starting structure, but with a slightly lower value of Rfree and R. Most important, however, was the appearance of well-connected density in the difference map into which a crude partial model of the Tween-80 could easily be built. This model was then subjected to another slow cool from 4,000 K, again using weak harmonic restraints for all atoms, except those of the Tween-80 molecule: a force constant of 10 kCal mol-1 Å-2 was used for all main-chain atoms, 5 kCal mol-1 Å-2 for other atoms, but only 1 kCal mol-1 Å-2 for all residues within a 5 Å radius of the Tween-80 molecule. Again, the refinement was very successful (see Figure 6), yielding an Rfree of 0.188 (R 0.165). Density for the Tween-80 molecule in the 2Fo-Fc map after the slow cool is shown in Figure 7. Note that this case is a splendid illustration of the fact that even low-resolution structures with conservative assumptions (strict NCS, grouped temperature factors) can yield low (free) R-factors.

Figure 6

Figure 6. Example of satisfactory behaviour of R and Rfree during a refinement cycle. The example is taken from the refinement of Candida antarctica lipase B in complex with Tween-80 [J. Uppenberg, N. Öhrner, M. Norin, K. Hult, G.J. Kleywegt, S. Patkar, V. Waagen, T. Anthonsen, and T.A. Jones, Biochemistry, in the press]. The refinement included energy minimisation, a slow cool from 4,000 K, more energy minimisation and grouped temperature factor refinement. The solid line shows the behaviour of Rfree; the dotted line that of the conventional R-factor. The RMS difference between both R-factors was only 0.017 and their correlation coefficient 0.980. Refer to the text for details.

Figure 7

Figure 7. Density for a Tween-80 model after refinement with SA [J. Uppenberg, N. Öhrner, M. Norin, K. Hult, G.J. Kleywegt, S. Patkar, V. Waagen, T. Anthonsen, and T.A. Jones, Biochemistry, in the press]. One refinement cycle yielded interpretable difference density for a part of the Tween-80 molecule. A crude model was built and refined in a second SA-refinement cycle (see Figure 6). The density for the Tween-80 after this second cycle is shown (compare to Figure 5). Refer to the text for details.

If sufficient computer resources are available, we recommend experimenting with parallel and serial slow cools. We often use parallel slow cools to try different refinement protocols (e.g., different weights for the crystallographic term, different temperature factor models, various uses of NCS), and select the one that yields the lowest value of Rfree (and, in case of a tie, the one with the smallest difference between R and Rfree). Serial slow cools often give better results than a single slow cool (an extra drop of 0.01-0.015 in Rfree) [42], provided that temperature factors were refined after the first round. As for which data to include, in our experience it is best to use data to the highest resolution limit from the start, rather than, for example, to start refining to 3.2 Å resolution first and gradually increasing the resolution limit in subsequent cycles.

In every refinement round (slow cool from 4,000 K, energy minimisation, temperature factor refinement) we plot the behaviour of R and Rfree as a function of "progress of refinement". The two curves should be very similar in overall shape, e.g. a sharp drop in R during temperature factor refinement should be accompanied by a similar drop in Rfree. Figure 6 shows an example of such behaviour, whereas Figure 2 demonstrates the "progress" of a dramatically unsuccessful refinement.

From release 4.0 onward, X-PLOR will contain a facility to use torsion MD [40,46], and we expect significant benefits from this. Carrying out SA refinement in torsion-angle space rather than in Cartesian coordinate space reduces the number of degrees of freedom [46] from 3N (where N is the number of atoms) to something of the order of N/3. Initial results indicate that this, plus the fact that SA refinement can now be carried out at considerably higher temperatures (up to 10,000 K), increases the radius of convergence of SA from ~1.2 Å to ~1.7 Å [46]. In addition, one obtains near-ideal bond lengths and bond angles "for free".

Of course there are still limitations to what SA refinement can do. In Cartesian coordinate space, the RMS radius of convergence is of the order of ~1.2 Å [42], although local changes of up to ~8 Å are possible [42,47]. Torsion MD protocols may extend the radius of convergence to ~1.7 Å [46]. SA refinement will not make changes that require the breaking of covalent bonds (which is often done temporarily during manual rebuilding). Nor can SA refinement optimise the fit of individual residues. For example, the side-chain oxygen and nitrogen atoms of Asn and Gln residues, and the rings of His residues, often end up flipped by 180 degrees (a careful human would take the whole hydrogen-bonding network into account to decide on the proper orientation). Also, SA refinement does not always produce rotamer-like side-chain conformations (although these could be enforced by applying strong restraints on dihedral angles). Finally, gross errors (in tracing, connectivity and sequence) cannot be fixed by a refinement program. For these reasons, manual rebuilding of protein structures is still necessary at present.

Force fields and dictionaries. Every refinement program nowadays uses geometric and other restraints to augment the X-ray data. As for geometry, the best set of bond and angle parameters available today is that developed by Engh and Huber [48] based on an analysis of small-molecule crystal structures from the Cambridge Database [49] (CSD), which, through the efforts of John Priestle, are now available for the most widely used refinement and rebuilding programs [50]. When used in combination with a reduced weight for the crystallographic pseudo-energy term, protein models with good stereo-chemistry are virtually guaranteed (but note that good stereo-chemistry says absolutely nothing about how well the structure models the data; see the discussion of Table I above). As more protein structures are solved at atomic resolution, better dictionaries can be expected in the next few years [51].

However, problems remain in deriving "ideal" parameters for non-protein entities. Writing an X-PLOR topology and parameter file for a ligand, for example, is a cumbersome, time-consuming and error-prone process. What is often not realised is that, in fact, one has to specify exactly what one would like the ligand to look like (with the exception of torsions around freely rotatable bonds) in the absence of any crystallographic information. This means that every sp1 and sp2 carbon gives rise to a "flatness" restraint, that the chirality of every chiral carbon must be restrained, etc. If one is lucky, the structure of a ligand has been solved separately and can be retrieved from a database such as the CSD. In that case, we suggest to derive all ideal values from this structure and to use heavy weights for the restraints. In other cases, one may be able to find the structure of a common co-factor or ligand in another structure in the PDB (we have created a collection of several hundred such small molecules). If no structure is available, one will have to use "rule-of-thumb" values, or resort to quantum-chemical or molecular mechanics calculations. In the case of low-resolution data, it may be best to almost constrain the ligand (by using very heavy weights for bond lengths and angles) so that it effectively has only a few degrees of freedom left (freely rotatable carbon-carbon bonds, for instance). Again we must emphasise the problem with low-resolution data. If the ligand dictionary allows ring puckering, for instance, the refinement program is invited to take liberties (in this fashion even aromatic rings can easily be "refined" into a non-planar conformation).

To remove much of the tedium and human error from the dictionary-generation process, we have written a small program [G.J.K., unpublished results, 1994] which, given the structure of a small molecule in PDB format, automatically generates:

an X-PLOR topology file;
an X-PLOR parameter file (with bond lengths etc. derived from the structure and with weights which are of the same order of magnitude as those used in the Engh and Huber force field);
an X-PLOR input file to energy-minimise the ligand structure alone in the absence of a crystallographic pseudo-energy term.

Prior to including the ligand in crystallographic refinement, one should always use energy minimisation without the X-ray term of only the ligand. The result shows the structure that the refinement program will attempt to produce when the complete model is refined with inclusion of the experimental data.

Other issues.

* Data. One popular way to get lower R-factors is to manipulate the low-resolution cut-off of the data. If one does not include an explicit bulk-solvent model, a cut-off of ~8 Å seems reasonable. As for the sigma cut-off, we tend to use all observed reflections nowadays. As for the Rfree partitioning, a fraction of 5 - 10 % of the data (with a maximum of ~2,000 reflections) is usually sufficient to make meaningful use of Rfree.

* Temperature factors. As is the case with geometry, temperature factors should be refined using constraints and restraints, guided by Rfree. This may seem self-evident, but there are dozens of structures [27] in the PDB which have RMS Delta-B values for bonded or NCS-related atoms that exceed 10 Å2. After a substantial rebuild, we tend to reset temperature factors, either by resetting all values to a single, average value, or by limiting the extremes (e.g., by resetting all B values lower than 10 &Aring2 to 10 Å2, and those exceeding 50 Å2 to 50 Å2). In the case of strict NCS, we usually end up refining grouped temperature factors at any resolution. The reason is probably that grouped temperature factors are not subject to restraints on bonded atoms, which means that high temperature factors can be assigned to just one or two residues in a loop if these residues do not obey exact NCS. In the case of restrained individual temperature factors, such high values can only be assigned by the refinement program if they are propagated through a stretch of residues which could give a false impression of the extent of the area in which the NCS breaks down. Very high temperature factors are usually caused by either a significant deviation from the assumption of strict NCS, or by non-existent density. This often pertains to solvent entities, but occasionally even the protein may suffer from this. For example, while refining the structure of human alpha class glutathione S-transferase [52] we observed strange behaviour for residue 103. The data extended to 2.6 Å so the structure (a pair of dimers) was refined with strict four-fold NCS and grouped temperature factors, and electron-density averaging was used prior to rebuilding. Residue 103, situated in an alpha-helix, had been built as an aspartic acid. It was situated in a fairly internal position without a salt link. Even with averaging, the density never showed any branching from the main chain. After GTFR, the main-chain atoms obtained a B value of 2 Å2 (the lowest value allowed during refinement), whereas the side chain was assigned a B value of 65 Å2. We therefore re-checked the sequence and found that this residue was actually a glycine. This reinforced our belief in the quality of the model and the accuracy of the temperature factors (not necessarily their precision). A good way to learn how to detect local errors, in particular for inexperienced crystallographers, might actually be to include a number of them deliberately. For instance, one could introduce an out-of-register error, or change a number of residues (e.g., alanine to leucine, which would lead to "include maps" as opposed to "omit maps"), and check if they are clearly identifiable during the refinement and rebuilding process. By monitoring the behaviour of the model and the maps in such areas, one would also learn to recognise undeliberate errors elsewhere in the model. (The cynic will note that this practice is already widespread when it comes to water molecules, but usually not for the present purpose.)

* Extended models. An average model of a protein structure, including ligands, co-factors, tightly bound waters, etc., does not give a complete description of the observed data. The data are time-averaged and space-averaged intensities, plus error terms and deviations due to crystal defects, disorder, mobility, absorption, decay, etc. Some of these effects can be modelled. Best known is the use of alternative conformations and refinement of occupancies. These should, however, only be used at high resolution, and with Rfree as a control to check if the model actually improves [53]. Bulk solvent can be modelled in myriad ways; however, it has been shown that a simple flat solvent model (with only two adjustable parameters: a bulk density and a bulk temperature factor) gives the best results in terms of Rfree [54]. Using a bulk-solvent model, one can include data to very low resolution (~30 Å), which may improve the density at the surface of the molecule [55].

For a short period of time, time-averaged MD looked to become popular as a way of modelling dynamic and space-averaged effects [56]. However, it has been shown recently [57] that ensemble-averaged MD with only a few separate copies of a structure gives results superior to time-averaged MD with hundreds or even thousands of structures. As with non-crystallographic symmetry, however, at low resolution the "null-hypothesis" has to be that all molecules are equal. Only when one has sufficiently high data-to-parameter ratios (i.e., high-resolution data) can one even begin to contemplate such procedures.

What is desperately needed for a better understanding of the remaining "R-factor gap" between small molecule and large molecule structure determinations are studies at very high resolution (better than ~1.2 Å) [58] on medium-size and large proteins and inter-laboratory validation experiments (commonly applied to techniques in analytical chemistry). This will teach us something as to how important anisotropic motion is, how different NCS-related protein molecules really are, how frequent conformational heterogeneity is, etc., and how much of this variation can be explained by variations in refinement and rebuilding practices and programs.

QUALITY CONTROL

Quality control is now an integral part of our model-building and refinement process, i.e. it is not something that is done only once, a posteriori, for the sole purpose of filling in some tables in the publication. Quality control entails the use of our knowledge with respect to the structure of macromolecules to find places in the model which need special attention and are possibly in error. Zou and Mowbray [59] have described the benefits that can be attained by the use of empirical knowledge (as embodied in databases) during protein rebuilding and refinement.

In judging the quality of a model produced by the refinement program we use the following criteria:

stereo-chemistry. Deviations from ideality of bond lengths, bond angles and violations of dihedral, flatness and chirality restraints are conveniently analysed with X-PLOR and ProCheck [60,61].
RMS distances, RMS Delta-B values and Delta-phi, Delta-psi plots [1,62] and chi1,chi2 plots [27] for NCS-related molecules if no constraints are used;
temperature factors. For instance, a plot of average B value versus residue number, RMS Delta-B for bonded atoms, average B values for protein, ligand, substrate, waters, etc. A radial temperature-factor plot (average temperature factor of atoms in shells as a function of their distance to the core of the molecule; [E.J. Dodson, personal communication, 1995]) should be shaped roughly like a parabola for a globular protein, with a minimum for the residues in the core of the molecule.
Ramachandran plot [63] (in itself and compared to the previous model), and a multiple-model Ramachandran plot in the case of (un-)restrained NCS [27].
rotamer analysis. Side-chains are analysed with the RSC_fit command in O [16,59] which calculates the RMS distance between a residue's side-chain atoms and those of the most similar rotamer conformation; a value greater than ~1.5 Å means that the side chain is not in a rotamer conformation [59].
peptide orientation. This property is calculated with the Pep_flip command in O [16] which superimposes the CA atoms of a residue and its two nearest neighbours at both sides with similar fragments found in a database of well-refined structures. After the superpositioning, the RMS distance between the carbonyl oxygen atom of the residue and the corresponding atom in the database fragments is calculated. A value greater than ~2.5 Å implies that the peptide plane is flipped compared to the orientation most frequently observed in the database.
residue real-space electron-density fit [16], calculated for all atoms, main-chain atoms only or side-chain atoms only. This property can be expressed either as an R-factor or as a correlation coefficient between density calculated from the atomic positions and that in a 2Fo-Fc or experimentally phased map; both can be computed with the RS_fit command in O. The correlation coefficient has the advantage that it is independent of the scales of the two densities. However, this also means that very weak observed density may still correlate well with the calculated density, leading to the impression that the fit is good even though no density may be visible at a level of one sigma. It is not possible to give absolute guidelines as to what cut-off to use between "good" and "bad" values. Usually, we use a cut-off of "the average minus one or two sigma" for the correlation coefficient, and "the average plus one sigma" for the R-factor.
close contacts, including those due to NCS and crystal packing. These are easily detected with X-PLOR.

Naturally, assessing all these criteria for each and every residue during each and every rebuilding session is cumbersome. Therefore we use a program called OOPS [64] to aid in this process. The program makes use of O's ability to generate and use so-called residue and atom properties (these are explained in more detail in the accompanying chapter [3]). The idea behind OOPS is to calculate the values of some of the quality indicators in O before starting the rebuild, and to generate others on the fly from the coordinate file of the present model. OOPS gathers and integrates all this information on a per-residue basis. Criteria that can be checked include pep-flip values, RSC values, real space (RS)-fit values, suspiciously low or high temperature factors and occupancies, phi,psi values, peptide planarity and CA chirality; there is also a provision to check up to ten user-defined criteria. In addition, the present model can be compared to a previous model and all residues which have changed considerably during refinement (in terms of RMS distance, RMS Delta-B, RMS occupancy change, RMS Delta-phi, Delta-psi and/or RMS Delta-chi1, Delta-chi2) are flagged as being worthy of closer scrutiny during the subsequent rebuilding session. If a residue scores poorly for any of the criteria that the user wishes to check, or differs considerably from the previous model, a small O macro is generated which:

puts the residue at the centre of the display;
prints information as to what is bad or suspect about it;
executes a set of user-defined O commands (e.g., to draw a map and/or nearby residues);
adds a command to the O menu which will take the user to the next suspect residue automatically.

This means that the user is taken from one bad or suspect residue to the next and is told what may be wrong with each of them. Especially in the later stages of refinement, when the protein model does not change very much any more, this can save enormous amounts of time. Instead of having to look at each and every residue in turn, draw the maps, etc., only to find that in 9 out of 10 cases the residue is okay, the user can now focus all attention on the five or ten percent of the residues which may actually need to be adjusted or rebuilt.

In addition to this, OOPS produces plots of various properties as a function of residue number, statistics for all properties, and a residue-by-residue "critique" of the structure. The plots can be used to reveal areas where the structure is particularly bad, the statistics are useful to judge the overall quality and to decide which cut-off values to use, and the residue-by-residue listing is saved in an electronic notebook file which can be edited during the rebuilding session.

It should be pointed out that there are two types of residue which give rise to violations: those that are wrong (errors), and those that actually do have an unusual conformation ("outliers-for-a-reason"). The latter type is often found in the interesting places in a structure, for example in a ligand- or substrate-binding site [65]. They can be recognised as such by very convincing density, and a tendency to return to the same conformation, even after rebuilding and SA refinement. Residues that are in error, on the other hand, almost always have problematic (poorly fitting or absent) density and will either maintain their rebuilt conformation after SA refinement (indicating that the error has been fixed), or they will end up in yet another conformation (usually, this happens for surface and loop residues which are poorly defined by the data). As a rule-of-thumb, for a well-refined, high-resolution model, one would expect <2 % "outliers-for-a-reason" in the Ramachandran plot, ~1-2 % residues with unusual peptide orientations, and ~5-10 % residues with non-rotamer side-chain conformations [8,27].

In the early stages, while the model is still crude, we prefer to use very strict criteria and to check all residues (for example, often a side chain which has a reasonable RSC value can nevertheless be replaced by a rotamer that fits the density equally well or even better). In later rounds, the criteria can be relaxed and only suspicious-looking residues need to be checked.

MAPS

For rebuilding we tend to use 3Fo-2Fc, 2Fo-Fc and Fo-Fc maps. Fo-Fc difference maps should be contoured both at positive and negative levels. In trouble areas, SA-omit maps can be calculated [66]. Standard omit maps are not a good idea, since it may be impossible to tell whether the re-appearance of density is real or due to model-bias [66]. After an SA-omit run, we calculate both 2Fo-Fc and Fo-Fc omit maps. The density in the omitted area should be very similar for both maps. During difficult refinements, we occasionally use systematic SA-omit maps. In this procedure, all residues are omitted in turn (using stretches of 5-10 residues at a time) and they are rebuilt in the resulting SA-omit map. Naturally, if an experimentally phased map is available, it should be consulted during rebuilding as well.

In the case of NCS, we invariably use our real-space electron-density averaging programs [67,68]. Averaging is a well-established and very powerful method for map improvement. Not only does it often improve the density in areas where the unaveraged map has no visible density at all, it also helps in the identification of regions where the NCS breaks down. In our experience, deviations from NCS are often much smaller than one would expect on the basis of published structures which were refined without making use of the NCS [27]. For example, in (NCS) disordered loops, there are often only one or two residues for which even after averaging no density is visible.

One sometimes has multiple non-isomorphous crystal forms which means that cross-crystal averaging can be used to obtain better maps. For instance, in the refinement of the complex between acetylcholinesterase and fasciculin II [69] we averaged maps of two crystal forms (one at 3.0 Å and the other at 3.2 Å resolution) in order to confirm the correctness of the trace and the side-chain orientations in a crucial region of the protein-protein interface for which there was poor density in the 3.0 Å map.

Figure 8

Figure 8. Structure of TTNPB (a) and "Compound 19" (b) [G.J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo, and T.A. Jones, Structure 2, 1241 (1994)].

If one carries out refinement carefully, one may also confidently "listen" to the density when it insists that something is wrong, and safely assume this really to be the case, rather than an artifact introduced by over-fitting. For example, recently we solved the structure of CRABP type II in complex with all-trans-retinoic acid at 1.8 Å [28]. This structure was used to solve a complex of the same protein with another ligand at 2.2 Å resolution. Due to a communication problem with our collaborators who supplied the ligand, we thought that the ligand we had used was TTNPB (see Figure 8a), and initially built this into the density, even though the fit was not perfect. After more, seemingly successful SA refinement, the density still failed to cover the whole ligand, and three strong peaks stubbornly showed up in the difference maps: a positive one at a distance of ~1.5 Å from C6 (suggesting a methyl-like substituent), and two negative ones in the positions of atoms C22 and C23 (suggesting that both methyl groups were not really there). An SA-omit map (leaving out the ligand) was calculated, and on the basis of that density we were convinced that the ligand was not TTNPB. After talking to our collaborators, we found out that the actual ligand was a molecule called "Compound 19" (see Figure 8b). With only the covalent structure of this compound, we could easily build a model that fitted the density (see Figure 9) and complete the refinement of the structure. It is important to realise that in the case of complex structures, where one basically uses protein-crystallographic methods to determine a small-molecule structure, the density is the only guide one has to confirm the presence of the assumed compound or detect that something is amiss (see also the example of the CBHI complex discussed earlier). No quality indicator exists that specifically tracks down errors at this level (unless the error affects nearby protein residues, e.g. by forcing them into very unusual conformations); since the number of scatterers is usually small, even Rfree may behave seemingly normally when something is wrong. High temperature factors sometimes indicate that something is wrong, but they may also occur for a variety of other reasons (e.g., mobility, disorder, or low occupancy). In order to be able to rely on the density, good data, careful refinement, and a healthy dose of skepticism are a conditio sine qua non.

Figure 9

Figure 9. SA-omit Fo-Fc map with the manually built and energy-minimised model of Compound 19 overlaid [G.J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo, and T.A. Jones, Structure 2, 1241 (1994)].

REBUILDING

Manual rebuilding of a structure is somewhat of a "black art", best learned by practising it on a lot of different structures. Nevertheless, there are a number of simple questions that one has to ask oneself all the time, for every residue in turn. The answers to these questions will determine what action has to be taken:

if a residue violates a certain quality criterion, is there a special feature in the structure which may explain the violation ? Can the model be rebuilt so as not to violate the criterion, and still satisfy the density ? For example, proximity to a ligand, ion or substrate may explain an unusual peptide orientation, or a "forbidden" combination of phi,psi values for a nearby residue; formation of salt links, hydrogen bonds or co-ordination to a metal ion may induce an unusual side-chain conformation, and the same is true for NCS and/or crystal packing interactions.
even if a residue does not violate any criteria, one should still wonder: does the residue make sense in this place in the structure and in this conformation ? For example, if one encounters a single buried charged residue inside a hydrophobic pocket, there could be an error (either a sequence error or an out-of-register error). Also, at low resolution and in crude models one should be very reluctant to "believe" in non-rotamer side-chain conformations. Often a rotamer will fit the density equally well (perhaps after a small rotation and translation of the whole residue, or after adjustment of one or two torsion angles; see Figure 10 for an example) and is therefore to be preferred given the high odds that it is closer to the real conformation.
if one decides to rebuild a residue, one should ask oneself afterwards: did the rebuild improve things ? After a residue has been rebuilt, the geometry of the residue and the nearby residues (e.g., two on each side) should be regularised to restore proper bond lengths etc. For example, if one encounters a residue with a pep-flip value of 2.5 Å and poor main-chain density, rebuilds the structure and then obtains a pep-flip value of 3.2 Å, the rebuild was probably unwarranted. In such a case, the original structure should be restored.

Figure 10a

Figure 10b

Figure 10. Example of a case where a non-rotamer side-chain conformation was built in a low-resolution (3.0 Å) map, which can easily be replaced by a rotamer conformation which fits the density equally well, if not better. (a) A non-rotamer conformation for a leucine residue. The RSC-fit value, as calculated by O, is 2.09 Å, with chi1=-70 degrees and chi2=-25 degrees. (b) A better-fitting rotamer for the same residue. The RSC-fit value is 0.63, with chi1=-38 degrees and chi2=175 degrees (close to the values for the most common leucine rotamer, namely chi1=-60 degrees and chi2=180 degrees).

In the following, we shall discuss violations of specific criteria, their possible causes, and possible remedies (using the tools in O). Again, all rebuilding should be followed by regularisation to restore proper stereo-chemistry (with the Refi_zone command [70]).

stereo-chemistry. If a residue has poor stereo-chemistry, usually this is because it is poorly defined by the data, due to mobility, disorder, or breakdown of the NCS. The best way to fix this for amino acid side chains, is to replace the residue by a rotamer with the Lego_side_chain command (and perhaps adjust one or two torsion angles with the Tor_residue or Tor_general command). The stereo-chemistry of the main chain and of non-protein entities can be improved with the Refi_zone command.
peptide orientation. If a residue has a high pep-flip value (e.g., exceeding 2.5 Å), one should look for a structural reason. If none is obvious and the density is ambiguous, the peptide plane should be "flipped" (with the Flip_peptide command). If the original orientation was correct after all, this may show up after the next round of SA refinement.
non-rotamers. One may safely assume 90-95 % of the residues to have a side-chain conformation resembling that of a common rotamer [8,27]. Very often, non-rotamers can be replaced by rotamers (with the Lego_side_chain command), and then fitted to the density through rigid-body movements of the whole residue (with the Move_zone command using the CA atom as the pivot point) and regularisation (Figure 10 shows an example of this). Some residues (Glx, Lys, Arg and Met), usually do not adopt rotamer conformations beyond the chi2 torsion angle; they can usually be fitted with rotations around torsion angles, but energetically favoured conformations should be tried first. We strongly advise against moving individual atoms in amino-acid residues, unless in a well-refined model all attempts to use common sense (i.e., database conformations) fail to yield a satisfactory fit to the density.
poor density. There are many reasons why a residue may have poor density (and, usually, accompanying high temperature factors). Surface residues often have poor density due to mobility and should be replaced by a rotamer or stripped back to an alanine. Loops and the C- and N-termini sometimes have poor density due to disorder or mobility. In the case of NCS, poor averaged density may indicate areas where the NCS breaks down, especially if good density exists in the unaveraged maps. Water molecules with poor density are probably not tightly bound and should be removed from the model. Sometimes a single wrong peptide orientation puts strain on a short zone of residues; this is an error in the structure and should be remedied by flipping the peptide and regularising the zone afterwards. Biologically interesting non-protein entities with poor density should be treated with the greatest caution (see the examples of CBHI and CRABP type II mentioned earlier).
bad regions. Sometimes one encounters a stretch of consecutive residues which all violate several quality criteria. One should then seriously consider the possibility that there is a serious error in that part of the structure. An SA-omit map should be calculated with the suspect region omitted from the model. In the case of an MR structure, bad regions may indicate areas where the new structure is significantly different from the search model, but beyond the reach of the refinement protocol. Also in this case, an SA-omit map may provide hints as to how the structure should be rebuilt. Rebuilding loops etc. can be done in O with the Baton commands or the Lego_loop command, followed by Lego_auto_sc and Lego_side_chain to put the side chains in as rotamers. If one refines a model with strict NCS, poor regions are often characterised by high temperature factors and may indicate a local breakdown of the non-crystallographic symmetry. If one has sufficiently high-resolution data, one may then experiment with NCS restraints instead of constraints, using a lower force constant for the areas where the NCS is expected to break down. However, one should be conservative, since such NCS breakdowns may easily become self-fulfilling prophecies [1,27]. A refinement program will gladly accept any extra degrees of freedom it is given to "fit" errors in the data. In other words: if a zone of residues is subjected to only weak NCS restraints, it will definitely become different in each of the NCS-related molecules. Therefore, one should always use averaged maps: if the average density for a loop is good, with the exception of two residues, only those two residues should be restrained weakly, and not the entire loop. If one does not have high-resolution data, one should simply accept the locally poor structure as it is.
nucleic acids. The current version of O lacks specific support for nucleic acids in the Lego and Baton commands. Otherwise they can be treated with the same tools as proteins.
water molecules. One should postpone the addition of solvent entities until the protein model is essentially complete and sufficiently well refined, to prevent placing waters in noise peaks as well as in places where side-chain atoms ought to go. In the case of NCS and low-resolution data, averaged difference maps should be used so as to include only waters which obey the NCS. Potential water molecules tend to show up as concomitant peaks in Fo-Fc and 2Fo-Fc maps. They should have plausible hydrogen-bonding partners within ~3.5 Å. Waters which move around a lot during refinement, those that obtain high temperature factors, and those that do not have confidence-inspiring density after refinement should be omitted from the model. In O, water molecules can be moved into the peak of the density by hand (with the Move_atom command), or, preferably, automatically (with the RSR_rigid command). Tradition has it that most single-peak solvent entities are modelled as waters, even though there may have been several other isoelectronic entities in the crystallisation solution (e.g., Na+, NH4+, F-). Electron density alone can not be used to discern between these ions and water.
small molecules. As explained in the section on refinement, it helps if the structure of a ligand, substrate, carbohydrate, inhibitor or co-factor can be retrieved from a database. Inside O, several commands can be used to optimise the fit between model and map. The RSR_rigid command does a rigid-body optimisation of (a fragment of) the molecule against a map. The Tor_general command can be used to adjust torsion angles, which is usually sufficient. If one creates appropriate dictionaries, the Tor_residue and Refi_zone commands can also be used. Again, moving individual atoms around should be avoided, unless all else fails, since it will distort bond lengths and bond angles, and may even introduce chirality errors. One of our utility programs contains options to automatically generate some of the O dictionaries for hetero-entities from a PDB file.

FINAL MODEL

When refinement and rebuilding of the structure has converged, a final refinement round can be carried out using all diffraction data and employing only energy minimisation and temperature-factor refinement. After this, a final assessment of the quality of the structure has to be carried out. Factors to be taken into account are similar to those that should be checked in every macro-cycle. In addition, one may estimate the average coordinate error, for example from a Luzzati plot [29] using Rfree rather than the conventional R-factor [28]. If NCS is present and has not been constrained, differences between NCS-related molecules should be analysed skeptically [27]. A particularly sensitive way of analysing differences between NCS-related molecules is by comparing the main-chain and side-chain torsion angles of corresponding residues [1,8,27,62]. In principle, one could assess the adequacy of a model by plotting the distribution of (|Fo|-|Fc|)/sigma, which should have a mean of zero and a standard deviation of one. Unfortunately, the fact that our models are incomplete (and, consequently, the fact that the true scale factors for the structure-factor amplitudes are unknown), precludes such an analysis.

One potentially useful quality check that hasn't been explored in depth is that of a "real-space free R-factor". Since the number of reflections used for Rfree calculations is usually small, calculating maps with only these reflections is not very useful. However, one can omit each and every residue, water molecule, etc., in turn (in batches of 5-10 residues), calculate an SA-omit map, and assess how well the residue's density is predicted by the rest of the structure by evaluating the real-space fit of the calculated and the omit map. Assuming that most of the model bias has been removed by the SA calculation, this would even enable one to quantify the extent of model bias on a per-residue basis by comparing the real-space fit using an ordinary 2Fo-Fc map and that using the SA-omit 2Fo-Fc map. In the case of CRABP type II [28], the backward-traced structure at 3.0 Å has an average RS-fit correlation coefficient of 0.65 and an average RS R-factor of 0.36. Using the protocol outlined above (omitting 5 residues at a time), the average "free" RS-fit correlation coefficient goes down to 0.47, whereas the average "free" RS R-factor increases to 0.45.

How does one assess the quality of published structures ? If no coordinates (and structure factors) are available, one is dependent on what is written in the publication [8]. The first thing to check is the quality of the data: what is the multiplicity, Rmerge, completeness and I/sigma(I) ratio, both overall and in the highest resolution shells ? If complete tables of these quantities are included, one may roughly assess the effective resolution of the data (as opposed to the Bragg spacing of the single highest-resolution reflection). The second most important thing to do is to assess the quality and strategy (if any) of the refinement. Even in the absence of details one can usually guess how the refinement was carried out. If Rfree is not mentioned, it was probably not used; if it is mentioned, check if it was used throughout, and if the difference between R and Rfree is small. If no mention is made of NCS constraints or restraints, one may safely assume that the NCS was not employed during refinement. Temperature factors have been refined for individual atoms, unless specifically stated otherwise. If the Ramachandran plot is not shown or mentioned, it may have been very poor (or never produced).

If coordinates are available, many of the quality criteria can be easily checked [8]. However, for a full evaluation of the structure, observed structure-factor amplitudes are badly needed. Only with those can one check if the structure is an adequate model for the data or not, if necessary by re-doing the refinement. We therefore strongly recommend that structure factors be deposited together with coordinates.

Finally, when "validating" a model it is important to realise that any property which has been constrained or heavily restrained during refinement, and any property which has been closely monitored during rebuilding cannot be used as an independent criterion to assess (or "proove") the quality of the model. For instance, most refinement programs operate by minimising the difference between observed and calculated structure-factor amplitudes; therefore, the value of the conventional R-factor is hardly an independent quality criterion. Similarly, most refinement programs tightly restrain bond lengths, bond angles and certain (improper) torsion angles; therefore, low RMS deviations from ideal geometry cannot be waved around as proof of the quality of the structure. Also, if side-chain conformations are monitored, and rotamers are used in the rebuilding, a low fraction of residues with non-rotamer conformations is not necessarily a hallmark of a correctly traced structure. With the widespread use of the program ProCheck and its pretty output, a standard phrase has begun to creep up in papers describing protein structures: "the model has a quality better than expected for structures at this resolution". Again, this is a rather meaningless statement if the structure was refined using the Engh and Huber parameter set, since (with the exception of the Ramachandran plot) almost all criteria which ProCheck assesses have been restrained during the refinement. In fact, apart from the Ramachandran plot, both the backward-traced structure of CRABP type II [1] and that of A2U are of "better than average" quality according to ProCheck. For this reason, it is important to have one or two independent quality checks which are not applied in the refinement and rebuilding process, but only to assess the final model.

ACKNOWLEDGMENTS

This work was supported by the the Swedish Natural Science Research Council and Uppsala University. The many fruitful (and sometimes heated) discussions with other crystallographers from Uppsala, Dr. Eleanor Dodson (York) and the other members of the ESF-funded Biotech group, the participants in the York meeting on statistical validators in protein crystallography [36], and in particular Dr. Axel Brünger (Yale, New Haven) are gratefully acknowledged. We are also grateful to Dr. Randy Read (Edmonton) for his suggestion to investigate the effect of relationships between reflections in the case of NCS by refining a backward-traced structure with NCS. We would further like to thank Dr. Christina Divne (Uppsala) for allowing us to report her experiences in the refinement of cellobiohydrolase I complexes, and Dr. Jonas Uppenberg (Uppsala/Montpellier) for providing us with his lipase B/Tween-80 data.

REFERENCES

1. G.J. Kleywegt and T.A. Jones, Structure 3, 535.

2. C.I. Brändén and T.A. Jones, Nature 343, 687 (1990).

3. T.A. Jones and M. Kjeldgaard, this volume.

4. R. Lüthy, J.U. Bowie, and D. Eisenberg, Nature 356, 83 (1992).

5. G. Vriend and C. Sander, J. Appl. Cryst. 26, 47 (1993).

6. E.E. Lattman, Proteins Struct. Funct. Genet. 18, 103 (1994).

7. G.J. Kleywegt, H. Hoier and T.A. Jones, Acta Cryst. D, in the press.

8. G.J. Kleywegt and T.A. Jones, in "Making the Most of Your Model" (W.N. Hunter, J.M. Thornton, and S. Bailey, Eds.), p. 11, SERC Daresbury Laboratory, Daresbury, UK, 1995.

9. Y. Liu, D. Zhao, R. Altman, and O. Jardetzky, J. Biomol. NMR 2, 373 (1992).

10. D. Zhao and O. Jardetzky, J. Mol. Biol. 239, 601 (1994).

11. A.T. Brünger, this volume.

12. A.T. Brünger, Nature 355, 472 (1992).

13. A.T. Brünger, Acta Cryst. D49, 24 (1993).

14. A.T. Brünger, G.M. Clore, A.M. Gronenborn, R. Saffrich, and M. Nilges, Science 261, 328 (1993).

15. A.T. Brünger, "X-PLOR: a system for crystallography and NMR", Yale University, New Haven , CT (1990).

16. T.A. Jones, J.Y. Zou, S.W. Cowan, and M. Kjeldgaard, Acta Cryst. A47, 110 (1991).

17. T.A. Jones and M. Kjeldgaard, in "From First Map to Final Model " (S. Bailey, R. Hubbard, and D.A. Waller, Eds.), p. 1, SERC Daresbury Laboratory, Daresbury, U.K., 1994.

18. T.A. Jones and S. Thirup, EMBO J. 5, 819 (1986).

19. Z. Otwinowski, DENZO and SCALEPACK, unpublished programs.

20. Collaborative Computational Project Number 4, Acta Cryst. D50, 760 (1994).

21. V. Luzzati and D. Taupin, J. Appl. Cryst. 17, 273 (1984).

22. C. Divne, J. Ståhlberg, T. Reinikainen, L. Ruohonen, G. Pettersson, J.K.C. Knowles, T.T. Teeri, and T.A. Jones, Science 265, 524 (1994).

23. C. Divne, J. Ståhlberg, and T.A. Jones, to be published.

24. M. Sato, M. Yamamoto, K. Imada, Y. Katsube, N. Tanaka, and T. Higashi, J. Appl. Cryst. 25, 348 (1992).

25. D.E. McRee, J.A. Tainer, T.E. Meyer, J. van Beeumen, M.A. Cusanovich, and E.D. Getzoff, Proc. Natl. Acad. Sci. USA 86, 6533 (1989).

26. G.E.O. Borgstahl, D.R. Williams, and E.D. Getzoff, Biochemistry 34, 6278 (1995).

27. G.J. Kleywegt, Acta Cryst. D, in the press.

28. G.J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo, and T.A. Jones, Structure 2, 1241 (1994).

29. V. Luzzati, Acta Cryst. 5, 802 (1952).

30. G.J. Kleywegt, J. Björklund, J. Uppenberg, D. Ogg, L.D. Lehman-McKeeman, J.D. Oliver, and T.A. Jones, to be published.

31. S.W. Cowan, M.E. Newcomer, and T.A. Jones, J. Mol. Biol. 230, 1225 (1993).

32. T.A. Jones, T. Bergfors, J. Sedzik, and T. Unge, EMBO J. 7, 1597 (1988).

33. A.E. Eriksson, G.J. Kleywegt, M. Uhlén, and T.A. Jones, Structure 3, 265 (1995).

34. G.J. Kleywegt, J.Y. Zou, C. Divne, I. Sinning, J. Ståhlberg, T.T. Teeri, G. Davies, and T.A. Jones, to be published.

35. F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer, M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, and M. Tasumi, J. Mol. Biol. 112, 535 (1977).

36. E.J. Dodson, G.J. Kleywegt, and K.S. Wilson, Acta Cryst. D, in the press.

37. K. Valegård, J.B. Murray, P.G. Stockley, N.J. Stonehouse, and L. Liljas, Nature 371, 623 (1994).

38. H. Hoier, M. Schlömann, A. Hammer, J.P. Glusker, H.L. Carrell, A. Goldman, J.J. Stezowski, and U. Heinemann, Acta Cryst. D50, 75 (1994).

39. W.C. Hamilton, Acta Cryst. 18, 502 (1965).

40. A.T. Brünger and L.M. Rice, this volume.

41. A.T. Brünger, J. Kuryian, and M. Karplus, Science 235, 458 (1987).

42. A.T. Brünger and A. Krukowski, Acta Cryst. A46, 585 (1990).

43. A.T. Brünger, Annu. Rev. Phys. Chem. 42, 197 (1991).

44. J. Uppenberg, N. Öhrner, M. Norin, K. Hult, G.J. Kleywegt, S. Patkar, V. Waagen, T. Anthonsen, and T.A. Jones, Biochemistry, in the press.

45. J. Uppenberg, M. Trier Hansen, S. Patkar, and T.A. Jones, Structure 2, 293 (1994).

46. L.M. Rice and A.T. Brünger, Proteins Struct. Funct. Genet. 19, 277 (1994).

47. P. Gros, M. Fujinaga, B.W. Dijkstra, K.H. Kalk, and W.G.J. Hol, Acta Cryst. B45, 488 (1989).

48. R.A. Engh and R. Huber, Acta Cryst. A47, 392 (1991).

49. F.H. Allen, O. Kennard, and R. Taylor, Acc. Chem. Res. 16, 146 (1983).

50. J.P. Priestle, Structure 2, 911 (1994).

51. V.S. Lamzin, Z. Dauter, and K.S. Wilson, J. Appl. Cryst. 28, 338 (1995).

52. I. Sinning, G.J. Kleywegt, S.W. Cowan, P. Reinemer, H.W. Dirr, R. Huber, G.L. Gilliland, R.N. Armstrong, X. Ji, P.G. Board, B. Olin, B. Mannervik, and T.A. Jones, J. Mol. Biol. 232, 192 (1993).

53. G. Sheldrick and T. Schneider, this volume.

54. J.S. Jiang and A.T. Brünger, J. Mol. Biol. 243, 100 (1994).

55. D. Tronrud, this volume.

56. J.B. Clarage and G.N. Phillips, Acta Cryst. D50, 24 (1994).

57. F.T. Burling and A.T. Brünger, Israel J. Chem. 34, 165 (1994).

58. K.S. Wilson, in "From First Map to Final Model" (S. Bailey, R. Hubbard, and D.A. Waller, Eds.), p. 141, SERC Daresbury Laboratory, Daresbury, UK, 1994.

59. J.Y. Zou and S.L. Mowbray, Acta Cryst. D50, 237 (1994).

60. R.A. Laskowski, M.W. MacArthur, D.S. Moss, and J.M. Thornton, J. Appl. Cryst. 26, 283 (1993).

61. R.A. Laskowski, M.W. MacArthur, and J.M. Thornton, in "From First Map to Final Model " (S. Bailey, R. Hubbard, and D.A. Waller, Eds.), p. 149, SERC Daresbury Laboratory, Daresbury, U.K., 1994.

62. A.P. Korn and D.R. Rose, Prot. Engin. 7, 961 (1994).

63. C. Ramakrishnan and G.N. Ramachandran, Biophys. J. 5, 909 (1965).

64. G.J. Kleywegt and T.A. Jones, Acta Cryst. D, in the press.

65. O. Herzberg and J. Moult, Proteins Struct. Funct. Genet. 11, 223 (1991).

66. A. Hodel, S.H. Kim, and A.T. Brünger, Acta Cryst. A48, 851 (1992).

67. T.A. Jones, in "Molecular Replacement" (E.J. Dodson, S. Glover, and W. Wolf, Eds.), p. 91, SERC Daresbury Laboratory, Daresbury, UK, 1992.

68. G.J. Kleywegt and T.A. Jones, in "From First Map to Final Model" (S. Bailey, R. Hubbard, and D.A. Waller, Eds.), p. 59, SERC Daresbury Laboratory, Daresbury, UK, 1994.

69. M. Harel, G.J. Kleywegt, R. Ravelli, I. Silman, and J. Sussman, to be published.

70. J. Hermans and J.E. McQueen, Acta Cryst. A30, 730 (1974).

TABLE I. Tests of various NCS models at low resolution. [a]

Run Force cnst Temp. Final Final RMSD RMSB RMSA (K) R value free R (A) [b] (A) [c] (degrees) [d]

1 "infinity" 3,000 0.243 0.267 0.0 0.010 1.099 2 300 3,000 0.251 0.270 0.015 0.004 0.680 3 200 3,000 0.250 0.269 0.020 0.004 0.675 4 100 3,000 0.248 0.269 0.035 0.005 0.681 5 75 2,000 0.247 0.272 0.045 0.004 0.673 6 50 2,000 0.245 0.272 0.060 0.004 0.675 7 25 2,000 0.241 0.271 0.095 0.005 0.700 8 20 2,000 0.239 0.274 0.11 0.004 0.685 9 15 2,000 0.237 0.271 0.14 0.004 0.707 10 10 1,500 0.235 0.273 0.17 0.004 0.710 11 5 2,000 0.231 0.274 0.24 0.004 0.717 12 2 2,000 0.227 0.276 0.40 0.004 0.718 13 "0" 3,000 0.225 0.285 1.13 0.005 0.762

[a] Results of using NCS constraints, restraints or no restraints with slow-cool SA protocols (temperature steps of -50 K; followed by energy minimisation) at 2.5 Å resolution. An intermediate model of A2U [e], after energy minimisation, was used as input (initial R 0.248, Rfree 0.269). Some of the calculations crashed when started at 3,000 K and were therefore run from lower initial temperatures. The run with an NCS force constant "infinity" used strict NCS and yielded the best model (for our data, at 2.5 Å resolution); the one with a constant of "zero" used no restraints at all and yielded the poorest model.

[b] Average RMS distance between main-chain atoms in molecule A and each of the other three molecules.

[c] RMS deviations from ideality of the bond lengths. [f]

[d] RMS deviations from ideality of the bond angles. [f]

[e] G.J. Kleywegt, J. Björklund, J. Uppenberg, D. Ogg, L.D. Lehman-McKeeman, J.D. Oliver, and T.A. Jones, to be published.

[f] R.A. Engh and R. Huber, Acta Cryst. A47, 392 (1991).

TABLE II. Results of using a "traditional" refinement protocol with low-resolution data. [a]


				Conservative model	Over-fitted model

NCS				strict 2-fold		not used
Temperature factor model	grouped			individual
Scale for X-ray term		0.5			1
Resolution range (A)		8.0 - 2.9		6.0 - 2.9
R,Rfree				0.251 / 0.320		0.169 / 0.323

Number of non-H atoms		1123			2246
Reflections F > 2 sigma(F)	6743			6316
Refined parameters		3657			8984
Data-to-parameter ratio		1.8			0.7

Average B protein (A2)		49.4			43.7
RMS Delta-B bonded (A2)		n/a			6.9

RMSD NCS-related CA atoms (A)	0.0			0.50
RMSD NCS-related all atoms (A)	0.0			0.99
RMS Delta-B NSC CA atoms (A2)	0.0			10.5
RMS Delta-B all NCS atoms (A2)	0.0			12.0

RMS dev. bonds (A)		0.009			0.010
RMS dev. angles (degrees)	1.56			1.63
RMS dev. dihedrals (degrees)	26.9			27.0
RMS dev. impropers (degrees)	1.25			1.41

Bad contacts			0			3
Pep-flip outliers		3*2			7
RSC outliers			10*2			32
Ramachandran outliers		1*2			2
% Ramachandran most favoured	82			78
Overall G-factor [b]		+0.13			+0.06

[a] The structure of holo CRABP type I [c] was refined at 2.9 Å resolution with strict two-fold NCS and grouped temperature factors. The final model was then subjected to two 4,000 K SA calculations without any NCS constraints or restraints and with full weight for the crystallographic pseudo-energy term, and individual isotropic temperature factors were refined. The resulting structure is clearly inferior to the more conservative model.

[b] R.A. Laskowski, M.W. MacArthur, and J.M. Thornton, in "From First Map to Final Model " (S. Bailey, R. Hubbard, and D.A. Waller, Eds.), p. 149, SERC Daresbury Laboratory, Daresbury, U.K., 1994.

[c] G.J. Kleywegt, T. Bergfors, H. Senn, P. Le Motte, B. Gsell, K. Shudo, and T.A. Jones, Structure 2, 1241 (1994).

TABLE III. Effect of NCS on Rfree for incorrect models. [a]

NCS: Constrained Restrained Unrestrained Test set: Random 0.365 / 0.465 0.347 / 0.484 0.268 / 0.522

Thin shells 0.360 / 0.477 0.348 / 0.485 0.266 / 0.552

Thick shells 0.361 / 0.470 0.351 / 0.531 0.267 / 0.531

[a] Results of experiments with an intentionally backward-traced model of A2U [b] to assess the effect of relationships between reflections in the case of NCS on the value of the free R-factor for grossly incorrect models. The structure was subjected to SA refinement using data with F>2sigma(F) between 6.0 and 3.0 Å. Three different methods of selecting 10 % test reflections were tried: random, in (15) thin resolution shells and in (5) thick shells. For each set of reflections, three different NCS models were tested: constrained (data-to-parameter ratio ~2.5), restrained and unrestrained (data-to-parameter ratio ~0.6). In all cases, "R-factor reducing tricks" were used (limited data, full weight of the crystallographic term and isotropic temperature-factor refinement). In all calculations, the initial R-factor was ~0.54 and the initial free R-factor ~0.55. The table shows the values of the conventional and free R-factor, respectively, for each refinement.

It is clear that no amount of over-fitting brings the free R-factor down to a "respectable" level. Also, with careful refinement (constrained or restrained NCS), not even the conventional R-factor can be "fooled". If the NCS is unrestrained, on the other hand, the conventional R-factor approaches the realm of respectability. Finally, the use of thin resolution shells of test reflections does appear to "uncouple" the conventional and free R-factor somewhat. Thick resolution shells should probably not be used, since they introduce sizeable resolution ranges which are systematically missing from the work set.

[b] G.J. Kleywegt, J. Björklund, J. Uppenberg, D. Ogg, L.D. Lehman-McKeeman, J.D. Oliver, and T.A. Jones, to be published.

Latest update at 24 November, 1999.