What Can We Do about Macromolecular Structures with Lousy Statistics?
Host
BiologyDescription
The Protein Data Bank has served as the repository for structures of macromolecules since the 1970's. It contains the coordinates for over 100,000 structures for proteins, DNA and RNA molecules, viruses, and supramolecular complexes. Over 90% of these structures were determined by single-crystal X-ray diffraction. Our ability to derive useful functional information from any of these structures depends largely on the quality of the structure. This quality can be characterized by the resolution value, i.e. the finest spatial detail that can be resolved within the structure, and by the agreement between the model structure and the data from which it was derived. That agreement is typically characterized by a quality index known as the Rfree value. Unsurprisingly, there is a strong correlation between resolution and Rfree, as viewed within the PDB as a whole; but there are outliers, viz. structures at high resolution that have been deposited with high (poor) Rfree values. I have been examining a number of these outlier structures and will report on the following phenomena:
1) My success or failure in improving these structures' statistics;
2) The consequences of those improvements in the interpretability of the data;
3) Some generalizations about which structures have inexplicably high Rfree values in spite of being determined at high resolution.
I will discuss the incorporation of this analysis into a wider project to archive and analyze raw data from diffraction experiments.