top of page

X-ray Crystallography: At the Interface of Physics, Chemistry and Biology

Proteins are essential to cellular processes, from the replication of DNA to the breaking down of nutrients for the production of energy. However, the first protein structure was only solved by X-ray crystallography centuries after the first cell was observed under the microscope.

X-ray crystallography is a method that uses X-ray diffraction patterns from protein crystals to model protein structures in order to understand their function.

The increasing interest in X-rays amongst physicists toward the 20th century gave birth to this novel structural determination technique. X-ray crystallography, in turn, enabled the development of the new field of structural biology, and, consequently, an increase in the number of solved protein structures.

An introduction to structural biology

Structural biology, as the name suggests, is the study of protein structures to understand their biological function and how structural changes might affect their functions.

Whilst biochemists have always been intrigued by how living organisms function at a molecular level, protein structural information can also prove extremely insightful for protein engineering or structure-based drug design. One such example is the antiviral Tamiflu, whose design is based on the structure of the H5N1 influenza virus neuraminidase protein.

Proteins are primarily made up of the 20 natural amino acids, which are connected by peptide bonds to form a polypeptide chain. Interactions between different amino acid functional groups (side chains) of the polypeptide chain allow the protein to fold into a defined three-dimensional shape, known as the tertiary structure of the protein. Some intrinsically disordered proteins (IDPs), however, have no fixed tertiary structure, making them more difficult to study using X-ray crystallography.

Different regions of the protein, known as domains, carry out different functions. Some bind to other proteins or nucleic acids, some catalyse biochemical reactions and others regulate the rate of these reactions. These different domains contribute to a protein’s overall function. Structural biology can elucidate key interactions between protein domains, shedding further light upon their biological roles.

A variety of experimental techniques are employed to understand the structure of proteins at the atomic level. The first and most widely used structure determination method is X-ray crystallography. Nevertheless, other recent major techniques include Nuclear Magnetic Resonance (NMR) and Cryo-Electron Microscopy (Cryo-EM). As structural studies become more challenging, different techniques are often used integratively, thereby taking advantage of the techniques’ individual strengths to understand various aspects of protein structures.

The principles of X-ray crystallography

William and Lawrence Bragg, a father and son duo, first used X-rays to solve the structure of naturally occurring zinc sulfide crystals in 1912. Since then, a large number of increasingly complex structures have been solved by X-ray crystallography. Initially, this technique was used to study the structures of organic compounds such as penicillin and vitamin B12. As the need to elucidate protein structures became a major focus for research projects, X-ray crystallography was used to solve several protein structures. These ranged from the “simpler” globular protein myoglobin, the first protein structure to be solved, to large protein complexes such as the ribosome.

The following section aims to highlight the key components of X-ray crystallography, which make studying the atomic detail of a wide range of molecules possible.

Figure 1: Timeline of structures solved by X-ray crystallography, with the structure of DNA as a reference point. As the technique develops, increasingly complex structures consisting of greater numbers of atoms can be studied. Some of the original models made can be found at the Science Museum. Figure adapted from Figure.1 Shi, 2014.


Diffraction occurs when a wave hits an obstacle and is subsequently scattered in different directions. In the case of X-ray crystallography, X-ray beams (a type of electromagnetic radiation) are diffracted by the electrons in a protein’s composite atoms.

A phenomenon that accompanies diffraction is interference, the interaction between scattered waves. The intensity of the resultant waves doubles when peaks interact with peaks (constructive interference) and are cancelled when peaks interact with troughs (destructive interference). Interference of scattered X-rays can be recorded as a diffraction pattern, consisting of spots with various intensities.

Figure 2: Diffraction and interference of X-rays. (a) Schematic of X-rays scattered by adjacent planes in a protein crystal lattice. Bragg’s Law (2dsinθ = nλ) describes the interference of the scattered beams (dotted) in terms of the distance between adjacent planes d, angle of diffraction θ and the beams’ wavelength λ. (b) Constructive interference occurs when Bragg’s Law is satisfied i.e. the peaks of the waves are aligned (in-phase). (c) Destructive interference occurs when the peak of one wave aligns with the trough of the other wave (out of phase). Figure adapted from Prof. S. Curry’s lecture at the Royal Institute, 2013.

Unlike photographs, which are 2D projections of the 3D objects, diffraction patterns do not visually resemble the structure being studied. Thus, crystallographers had to use mathematical models to piece back the position of individual atoms in a protein. Thankfully, the advancement of technology led to the development of sophisticated computer programs that now make these tasks easier!

Why X-rays?

X-rays are no stranger to medical imaging due to their high penetrating power; this advantage makes them suitable for studying crystals.

Whilst visible light reflects off the crystal’s surface, X-rays are able to penetrate multiple layers of the crystal lattice without experiencing a significant reduction in its intensity. More importantly, X-rays have shorter wavelengths in the order of 0.1 nm (or 1 Å), which is similar to the size of atoms and the distance between covalent bonds. This allows X-rays to resolve the chemical and biological structures of molecules at an atomic level, which are otherwise invisible under the light microscope.

Figure 3: The order of magnitude of proteins relative to parts of the electromagnetic spectrum. Light can only resolve objects larger than half its wavelength; hence, whilst protein crystals can be seen under a light microscope, X-rays are required to solve protein structures.

Why crystals?

The ordered repetition of zinc and sulphide atoms throughout the crystal lattice is what allowed the Braggs to determine the first crystal structures. The packing of protein molecules in a crystal is very similar to the arrangement of atoms in a salt crystal. However, unlike single atoms, proteins are asymmetrical. Thus, several proteins are often packed into a unit cell, which is repeated throughout the crystal lattice.

Proteins do not naturally exist as crystals and it is difficult to grow crystals from molecules as complex as proteins, making protein crystallization the bottleneck of X-ray crystallography. The zinc sulphide crystals Lawrence Bragg used in the first X-ray crystallography experiments only consists of a repeating unit of 2 atoms, whilst myoglobin, the first protein structure solved, consists of 1260 atoms!

Single proteins diffract very few X-rays, and their intensity is barely detectable. Another complication in X-ray crystallography is that proteins, like most biological materials, are prone to radiation damage. Individual proteins may denature under X-ray exposure, thereby compromising the resolution of data collected. Crystals, on the other hand, consist of trillions of identical proteins packed in a repeating lattice. This both amplifies the signal and ensures that there is sufficient sample in its native structure.

How X-ray crystallography works

Sample preparation

As with any biochemical experiments, sample preparation is crucial for downstream analysis. To obtain a high-resolution structure of the protein of interest, the sample needs to be prepared at high purity and high concentration.

In the early X-ray crystallography studies, proteins were often purified from natural sources. In 1953, Nobel laureate John Kendrew solved the first protein structure using X-ray crystallography. Myoglobin, a distant relative of haemoglobin, is an oxygen storage molecule in muscles. Kendrew purified his sample of myoglobin from sperm whale tissues since this deep-diving mammal has a higher abundance of myoglobin in its muscles.

Currently, most proteins being studied by X-ray crystallography are produced by recombinant DNA methods, whereby an expression host is transformed with a plasmid carrying the gene coding for the protein. The protein is then synthesised by the host’s transcription-translation system, and harvested when the desired expression level is reached. The commonly used expression system is that of E. coli, a type of bacteria. Sometimes, however, scientists use eukaryotic expression systems such as yeast or insect cells to produce more complex human proteins. Eukaryotic cells contain additional chaperone proteins that assist the translation and folding of proteins; these chaperones are often not present in bacteria. Thus, eukaryotic expression systems allow human proteins to be produced in their near-native structure.

Growing crystals

Similar to salt crystals, protein crystals are grown from a supersaturated solution. However, unlike ionic salts, proteins are much larger and more irregular in shape, making it more difficult for ordered crystals to form. As a result, protein crystals are grown under carefully controlled environments. In spite of this, the protein crystals formed are still quite small in size and have to be handled under a microscope.

Crystals are commonly grown through a process called vapour diffusion. This process is carried out in a vapour diffusion chamber, where the solvent is slowly removed to allow proteins to precipitate into regular crystals. The chamber consists of a small well for the protein solution, and a reservoir approximately 100 times the volume of the well. A precipitant solution is added to both the well and reservoir such that the precipitant concentration is lower in the well than in the reservoir. Once the chamber is sealed, water diffuses from the lower precipitant concentration in the well to the higher precipitant concentration in the reservoir. As the amount of water in the well reduces, protein crystals begin to form.

Figure 4: Side view of a vapour diffusion chamber. The protein solution is mixed with a precipitant solution in the well. The difference in precipitant concentration between well and the reservoir allows water vapour to diffuse from the lower precipitant concentration (well) to the higher (reservoir) until the system reaches equilibrium. The gradual removal of water from the well enables proteins to crystallise. Figure adapted from Fig.27.1 High Throughput pH Optimisation of Protein Crystallization, 2008.

Regular-shaped crystals, where the unit cells are aligned in the same orientation, improves the resolution of the data collected. Hence, a large number of different conditions are tested to improve the quality of the protein crystals. This includes varying the temperature, the precipitant used the precipitant concentration and the protein concentration. Modern labs are equipped with automated systems that can test up to 96 crystallisation conditions on a single plate!

X-ray diffraction: data collection

Once the protein crystals have formed, they are placed in a strong X-ray beamline and the scattered X-rays are recorded on an electronic detector. The crystal is cooled throughout its exposure to the X-rays with liquid nitrogen to preserve the protein's native structure. Furthermore, the crystal is rotated at a constant rate through small angular increments to collect diffraction patterns at different angles. Collecting diffraction data at multiple diffraction angles improves the accuracy and resolution of a protein’s calculated structure.

Figure 5: Schematic of an X-ray diffraction apparatus. The incident beam (bold) is scattered by the crystal. These scattered beams (dotted) interact and form a diffraction pattern unique to the protein.

Detailed protein structures would not be possible without the advancement in X-ray diffraction technologies. In the early days of X-ray crystallography, X-rays were generated by Crookes tubes. These tubes, however, greatly limit the intensity and wavelength of the X-rays generated. Crookes tubes have since been replaced by synchrotrons, large particle accelerators that accelerate electrons in a circular path close to the speed of light. Magnetic fields in the path of these electrons change their direction of travel. This releases energy in the form of high-intensity X-rays that travel at a tangent to the plane of the electron beam. By fine-tuning the magnetic field strength, crystallographers can also vary the wavelength of the X-rays to best suit their experiments.

Model building

The Braggs used mathematical analysis to correlate spots on the diffraction patterns to positions of atoms in salt crystals. Individual proteins consist of thousands of atoms; carrying out these calculations manually would be extremely tedious. Fortunately, this process is now carried out by specially developed software programs.

Electron density maps are constructed based on the way electrons surrounding atoms diffract X-rays. These maps can then be used to trace and construct the protein backbone and amino acid residues whilst abiding by amino acid structural conformation and stereochemistry rules.

This is not done in a single step, but rather through iterative rounds of adjustments, called refinements. This helps to gradually improve the accuracy of the model. The more accurate the model, the more information it reveals about the protein structure.

The Protein Data Bank

Once a protein structure has been solved, it is deposited in the Protein Data Bank (PDB), a worldwide database for the three-dimensional structures of proteins, nucleic acids and large macromolecules. The PDB allows structural biologists and anyone interested in structural biology to access any known protein structure.

Using existing PDB entries, structural biologists can draw comparisons between similar proteins. This information could provide useful insights for model building and the refinement of structures that have similar functions or shared domains. Alternatively, structural homologues could help elucidate the function of novel proteins. Scientists could also use these structural homologues to trace the molecular evolution of structurally related proteins, as seen in the case of myoglobin and haemoglobin structures.

To date, X-ray crystallography, the technique that pioneered structural biology, is still the most commonly used method. It contributes to the largest amount of new PDB entries.

Figure 6: PDB entries by year and structural determination techniques. X-ray crystallography is by far the most commonly used technique in structural biology. Cryo-EM, a more recent technique, is becoming more widely used in the field as its resolution continues to improve. [Data from PDB Statistics; figure made using RStudio and ggplot2 package].

A comparison between X-ray crystallography and other structural determination methods

X-ray crystallography is highly versatile in terms of the size of the molecules studied - it can be used to study small organic compounds or even large protein-nucleic acid complexes like the ribosome. X-ray crystallography has long between the only method that allows structures to be studied at atomic detail. However, with the rapid development of new technology, atomic resolution can now be achieved by cryo-EM.

There are some limitations to X-ray crystallography. In order for crystals to form, protein molecules need to line up in the same conformation and orientation. This is challenging for proteins with large flexible domains. Mutations are introduced or chemical inhibitors are added to stabilise the protein. Sometimes, the dynamic nature of the protein needs to be preserved to fully understand its function, especially in the case of intrinsically disordered proteins. Thus, NMR is often used to probe the conformational changes of the protein in solution.

From X-ray crystallography to solving real-world problems

Developed initially in a physics experiment, X-ray crystallography has changed the way biological molecules are studied. With the constant improvement in crystallisation methods and model building algorithms, X-ray crystallography is able to elucidate protein structures at increasingly high resolution, thus providing deeper insight into their function.

The increasing number of protein structures solved has also opened up exciting new fields. With the aid of detailed structural information, drugs with higher potency and fewer side effects can be designed for their target protein. By assessing known structures, scientists are also beginning to predict protein structure from their primary amino acid sequence in silico, potentially allowing more variants of diseased proteins to be studied.

Additional resources

The following videos by the Royal Institute explain how modern X-ray crystallography experiments are carried out:

This Royal Institute lecture gives a more in-depth explanation of the physical principles of X-ray diffraction:

Author: Amy Cheng, BSc Biochemistry

bottom of page