top of page

SARS-Cov-2: The Virus Behind the COVID-19 Pandemic [Part One]

Editor: This article is the first of a two-part series. Check out Part Two here.


In December 2019, a new viral pathogen emerged in humans. And within months, it has caused a global crisis with huge and widespread health, social and economic consequences.


The COVID-19 (Coronavirus Disease 2019) is caused by a new coronavirus, initially identified as 2019-nCoV (2019 novel coronavirus) and later renamed as SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) by the ICTV (International Committee on Taxonomy of Viruses). Since then, there have been over 37.3 million reported cases and over one million deaths.

The pandemic has seen countries all across the globe impose strict lockdowns and quarantining in an attempt to contain the virus. At the same time, the scientific community have learnt and will continue to learn as much as they can about the nature of this virus and its effect on humans, as well as understand how it can be treated and even vaccinated against.

SARS-CoV-2 is not the first coronavirus to have emerged and caused disease in humans. In the Part 1 and Part 2 of this article series, we will discuss the phylogenetic relationship of these coronaviruses and how they came to infect humans. We will also cover some of the basics of what is known about the new virus’s structure and genome and finally the nature of the disease it causes, and the therapeutic and preventative measures being explored.


Classification of the SARS-CoV-2

Like living organisms, viruses are classified into categories, known as taxa (plural of taxon). Viral taxonomy is essential for a number of reasons; one of these is that it allows us to study the evolutionary history of a virus. For example, if a virus jumped from an animal to humans, knowing the taxonomy of the virus is useful in understanding the details of the transmission event. Viral taxonomy also has clinical significance, as it helps us understand how the virus causes disease and how the virus and host interact. It is also useful when determining which treatments could potentially be used to in infected patients.

Coronaviruses were given their name because, when observed under a microscope, the virions appear to have a ‘corona’ (i.e. crown-like structure) or halo surrounding them.

All coronaviruses are members of the family Coronaviridae, which belongs to the order Nidovirales. The family is divided into two subfamilies. SARS-CoV-2 belongs to the subfamily Coronavirinae, which contains four genera. The genera alphacoronaviruses and betacoronaviruses both infect mammals only, whilst deltacoronaviruses and gammacoronaviruses infect birds and sometimes mammals. To help you visualise this better, the classification of coronaviruses is illustrated in Figure 1.

Figure 1: The classification of coronaviruses. The highly pathogenic human coronaviruses discussed in the article are shown in green. The category ‘individuum’ refers to different individuals of the same species, similar to how Rosalind Franklin and Charles Darwin are individuals of the species Homo sapiens. Some other interesting features include the Civet SARS-COVSZ3/2003, a 2003 coronavirus isolate from a palm civet, the SARS-CoV intermediate host, whose sequence was compared to human isolate sequences. Meanwhile, the SARS-CoV-2 Wuhan-Hu-1 is an isolate taken from one of the first COVID-19 patients in Wuhan on the 26th Dec 2019. It was the first SARS-CoV-2 isolate to be sequenced. Besides that, the SARSr-CoV RATG13 is a 2013 isolate from bat, whereby it is the closest known relative to SARS-CoV-2. And finally, SARS-CoVGZ-02 is a human isolate from the 2003 SARS outbreak. This figure was adapted from Rehman et al. (2003)’s Evolutionary Trajectory for the Emergence of Novel Coronavirus SARS-CoV-2 and CSG of the ICTV (2020)’s The species Severe acute respiratory syndrome-related coronavirus: classifying 2019-nCoV and naming it SARS-CoV-2.

Members of the Coronaviridae family have a positive sense, single-stranded RNA (ssRNA) genome. ‘Positive sense’ means the virus’s genome can be translated directly by the ribosomes as mRNA (messenger RNA), whilst ‘negative sense’ RNA must first be converted to positive sense before it can be translated (see Figure 2 for the illustration of positive and negative ssRNA viruses).

Figure 2: Single-stranded RNA virus genomes can either be positive sense or negative sense. Positive sense genomes can act as mRNA (messenger RNA) and be translated directly by the host ribosome into a polypeptide. In contrast, negative sense genomes can’t be translated directly as they need to be converted to positive sense RNA first.

Human coronaviruses (HCoVs) cause respiratory diseases, but the majority are very mild. In fact, HCoVs are responsible for 15-30% of common colds. However, three highly pathogenic coronaviruses, all members of the subgenera Betacoronavirus (See Figure 1), have surfaced in humans relatively recently. These viruses are zoonotic - this means that they originate in an animal reservoir but have jumped to humans, often via an ‘intermediate’ animal, where they quickly adapt to their new host.

Many emergent viruses are zoonotic, such as HIV-1 (Human Immunodeficiency Virus 1) which evolved from SIV (Simian Immunodeficiency Virus) in chimpanzees, and the Ebola virus. The origins of each of the three hCoVs are illustrated in Figure 3.


  1. The SARS coronavirus (SARS-CoV) emerged in humans in the Guangdong Province of China in November 2002 and was identified in 2003. It is thought to have entered humans from palm civets who were infected by bats, probably in crowded live animal markets. Containment measures were quickly introduced, so the pandemic only spread to 26 countries and was declared over by July 2003.

  2. The first case of MERS (Middle Eastern Respiratory Syndrome) was identified in Saudi Arabia in 2012. The causative agent, MERS-CoV (Middle Eastern Respiratory Syndrome Coronavirus) was a novel zoonotic virus, like SARS. It spread to humans from bats via dromedary camels.

  3. The SARS-CoV-2 outbreak is thought to have begun in the Huanan seafood market in Wuhan, a city in China’s Hubei region. The virus is suspected to have jumped from its animal reservoir (bats) to pangolins to humans.

Figure 3: The zoonotic transmission routes of SARS-CoV, MERS-CoV and SARS-CoV-2. Bats are the primary animal reservoir of all three highly pathogenic novel coronaviruses. From bats, each virus jumped to a different intermediate host species, which then transmitted the virus to humans. This figure is adapted from Xu (2020)’s Unveiling the Origin and Transmission of 2019-nCoV.

These coronaviruses differ in their fatality rates (the proportion of infected individuals who die because of the disease) and transmissibility, as summarised in Table 1. Both factors contribute significantly to the nature and extent of the outbreak- for example, SARS-CoV-2 is transmitted much more easily and has a higher rate of asymptomatic transmission so the virus has been able to spread to many more countries.

Table 1: The three highly pathogenic, zoonotic coronaviruses have all emerged in humans in the 21st century. The table shows the name of the virus and the disease it causes. The viruses differ in their fatality rates (meaning the proportion of infected individuals who die as a result of the disease) and relative transmissibility. This data is correct as of 11th October 2020. ‘CoV’ stands for coronavirus, ‘SARS’ represents for severe acute respiratory syndrome and ‘MERS’ stands for middle eastern respiratory syndrome.

The sources for the data in the table can be found below:


Armed with only the isolates of the virus, how do scientists go about finding the zoonotic origin of the virus? They compare the virus genome to known viruses in other animals to see if a match can be found.

We can use SARS-CoV-2 as an example as the process was very similar when the origin of SARS-CoV and MERS was studied.

The SARS-CoV-2 genome was sequenced in January 2020 and found to be 96% identical to the sequence of a coronavirus called RATG13. RATG13 is a coronavirus which had been discovered in horseshoe bats in China in 2013, and whose sequence was stored and eventually shared by the Wuhan Institute of Virology. RATG13 is the closest known relative to SARS-CoV-2, suggesting that the novel HCoV originated in bats. The 4% difference between the genomes of the two viruses is because they last shared a common ancestor around 50 years ago and have diverged away from each other since then. This also supports the theory that SARS-CoV-2 entered humans via an intermediate host. Similarities between SARS-CoV-2 and coronavirus isolates from the Sunda (also known as Malayan) pangolin point to this species as the intermediate.


Pangolins are the most trafficked animal in the world, with the Sunda pangolin being the mostly wildly traded of the eight pangolin species. With such a high level of contact between humans and this scaled mammal, it is unsurprising that the virus was able to jump. Not to mention, factors such as urbanisation, various human activities like hunting and climate change have increased the chances of zoonotic transmissions.


SARS-COV-2: the structure

A SARS-CoV-2 virion particle, seen in Figure 4, is around 70-90nm. It is an enveloped virus, meaning the core of the virus is surrounded by a host-derived lipid envelope.

Embedded in this envelope are three types of surface proteins. Firstly, the spike (S) protein gives the virus its ‘crown-like’ appearance, whereas the membrane (M) protein is a glycoprotein involved in host membrane fusion. Last but not least, the final surface protein is the envelope (E) protein.

Inside the envelope is the nucleocapsid which contains the viral RNA genome. In particular, the nucleocapsid (N) proteins coat the genome to protect it and help it get into new virion particles.

Figure 4: A SARS-CoV-2 virion particle. The nucleocapsid protein coats the RNA; and together, they form the nucleocapsid. The figure was adapted from Florindo et al. (2020)’s Immune-mediated approaches against COVID-19 and created using BioRender.

The S protein is very important for the virus, but also for us (the human host) when it comes to targeting the virus, as we will see later on with vaccinations. Its function is to allow the virus to enter the host cell. On the other hand, as a relatively large and exposed surface protein, so it is targeted by neutralising antibodies of the host immune system, whereby these immune proteins bind to the virus and stop it from entering the host cells.

Each S protein is made up of two subunits:


1. The upper domain (S1; a.k.a. the receptor-binding domain)

This subunit binds to the receptor on the host cell. In particular, this domain is highly variable between closely related spike proteins as there is a strong evolutionary pressure to mutate and evade antibodies. Therefore, mutations in the spike protein allow the virus to evolve and adapt to receptors in new hosts (e.g. pangolin to humans). As this mutation is advantageous, it will be selected for, allowing the virus to now infect a new host.

Despite the S protein mutations, coronaviruses have relatively low variation compared to other RNA viruses. RNA viruses, like HIV and influenza, have high mutation rates because the RNA polymerase used to replicate their genome is error-prone. This instability thereby hinders vaccine efforts.

However, coronaviruses are unusual among RNA viruses as they have a proofreading enzyme which lowers their mutation rate. This presents a potential benefit for future vaccination strategies.


2. The lower domain (S2)

The second subunit contains the virus’s fusion machinery needed for the host and viral membrane to fuse. This allows the RNA genome to be released into the host cell.

A mutation in the spike protein, which occurred early on in the pandemic, appears to have become more dominant over time. The 614th amino acid of the spike protein was originally an aspartate (D) but was replaced by a glycine (G), so it is called the D614G mutation.

It has been suggested that the mutation increases the transmissibility of the virus, and therefore positive selection has acted on it. Positive selection is an evolutionary process where advantageous mutations increase in frequency in a population, as supposed to negative selection where harmful mutations are removed.


SARS-COV-2: the genome

The SARS-CoV-2 genome (Check out Figure 5) contains 13 genes and can produce over 27 proteins. 16 of these are non-structural proteins (nsps), such as proteases and an RNA polymerase for genome replication. The rest are structural proteins (N, E, S, M) and accessory proteins.

Figure 5: The SARS-CoV-2 genome. Polyprotein 1a and polyprotein 1ab are both made up of non-structural proteins. Polyprotein 1ab is a fusion of the protein products of ORF1a (open reading frame 1a) and ORF1b, produced after a ribosomal frameshift. The virus’s RNA polymerase produces subgenomic mRNAs, which are then translated into structural and accessory proteins. ‘Nsps’ stands for non-structural proteins and ‘RdRp’ represents RNA-dependent RNA polymerase. This figure was adapted from ViralZone (2020)’s SARS coronavirus 2 /Covid-19 genome expression.


Like all viruses, SARS-CoV-2 is an obligate intracellular parasite, meaning that it depends on its host cell for survival and replication.

The virus uses various materials and cellular machinery from its host to carry out several steps in its lifecycle. For example, its positive sense single-stranded RNA genome is translated by the host ribosome (i.e. it reads the viral RNA as mRNA and, using host amino acids, synthesises viral proteins). More specifically, the SARS-CoV-2 genomic RNA has a 5’ cap and 3’ poly A tail, just like mRNA, to allow it to be directly translated.

In eukaryotic cells, a single mRNA represents one gene and is translated into a single protein. The ribosome is recruited at the 5’ cap, begins translating at the ‘start’ codon and stops at the ‘stop’ codon.

However, SARS-CoV-2 has all 13 of its genes in a single RNA so, in theory, the host ribosome will recognise the first gene but not the rest of the genes downstream.

Obviously, the virus needs all of its genes to be expressed, so it has some interesting mechanisms to overcome this obstacle...


The virus’s genome contains ORF (open reading frame) 1a and 1b, each of which is a single gene encoding multiple proteins. They are therefore translated to produce polyprotein 1a (a fusion of nsp1-11) or polyprotein 1ab (a fusion of nsp1-16). The proteases (i.e. enzymes which break down polypeptides/proteins) present in both polyproteins act in cis to cleave themselves out, then in trans to cleave the rest, leaving behind single, functional proteins.

On most occasions, polyprotein 1a is produced as the ribosome stops at the end of ORF1a. However, ORF1a and ORF1b are overlapping, so sometimes the ribosome ‘slips back’ 1 base, then continues translating in a new ORF. The result of this process, known as a -1 ribosomal frameshift, is polypeptide 1a (a fusion of the protein products of ORF1a and ORF1b). An example of a ribosomal frameshift is illustrated in Figure 6 to demonstrate this.

Figure 6: At particular sequences, known as ‘slippery sequences’, the ribosome backtracks one nucleotide continues translating in the -1 frame. The result is that all amino acids incorporated after the -1 ribosomal frameshift will be different, as the mRNA codons have been changed. In SARS-CoV-2, the ribosome will normally translate ORF1a only to synthesise polyprotein 1a. When a -1 ribosomal frameshift occurs, both ORF1a and overlapping ORF1b will be translated, so polyprotein 1ab is produced. This figure is adapted from ViralZone (2020)’s Ribosomal frameshifting.

2. Subgenomic RNAs

The production of polyproteins allows SARS-CoV-2 to produce all non-structural proteins, however it still needs to ensure that the downstream genes, which encode structural and accessory proteins, are translated. It does this by producing a nested set of sub-genomic RNAs (sgRNAs); a shared feature of all viruses in the order Nidovirales.

sgRNAs are negative sense as they are transcribed from the positive sense RNA genome. sgRNAs are then used as a template to produce positive sense subgenomic mRNAs - these all have the same 3’ end but have different genes at their 5’ end. Take a look at Figure 7, which shows how different proteins can be translated from subgenomic mRNAs.

Figure 7: The ribosome reads and translates the first gene in an mRNA transcript. Therefore, all genes downstream of the first gene will not be synthesised. To overcome this issue, coronaviruses, and other related viruses, produce sgRNAs so each gene can be the ’first gene’, enabling all proteins to eventually be translated. In the figure above, the four ‘subgenomic’ (don’t contain all 5 genes) represent sgRNAs of different lengths, whilst the full genome contains genes 1-5.

The SARS-CoV-2 transmission

Although SARS-CoV-2 was initially contracted from animals, the main method of transmission is between humans.


The virus is mainly spread via large ‘respiratory’ droplets containing SARS-CoV-2, which can be produced in a variety of ways, such as when an infected individual breathes, sneezes, coughs or talks.

Asymptomatic carriers, who have COVID-19 but don’t display symptoms, are capable of transmitting the virus. Those in the incubation period, known as pre-symptomatic individuals, also have the potential of infecting healthy individuals. The incubation period is the delay between an individual becoming infected and displaying symptoms; the median is 4 days, but it can last up to 14 days.


This has very important implications for controlling the pandemic, as asymptomatic and pre-symptomatic individuals can be difficult to identify and can often become super-spreaders’. The virus can also be transmitted indirectly via fomites (inanimate objects/surfaces which may be contaminated). This means that if an infected individual releases respiratory droplets onto a surface, a healthy individual may become infected if they touch this surface and then their eyes, nose or mouth. However, recent studies have suggested that this method of transmission poses a relatively low risk.


SARS-CoV-2 lifecycle

A summary of the steps in the SARS-CoV-2 lifecycle is illustrated in Figure 8.

1. Entry

Viruses bind to a specific receptor on a host cell in order to infect it.


The cellular receptor targeted by a virus determines its tropism (i.e. which cells/tissues it can infect). If a cell does not express the virus’s specific receptor, it is not susceptible to infection by that virus. The receptor used by SARS-CoV and SARS-CoV-2 is ACE2 (angiotensin-converting enzyme-2), whereas MERS-CoV uses DPP4 (dipeptidyl peptidase 4). Given that ACE2 is expressed on a variety of cells, this explains the symptoms associated with COVID-19. For example, lung epithelial cells express the SARS-CoV-2 receptor and therefore can be infected, resulting in respiratory illness symptoms such as pneumonia. For more details, Table 2 shows several organs and tissues expressing the ACE2 receptor and some of the symptoms associated with COVID-19 in each location.

Table 2: The expression sites of ACE2, the receptor for SARS-CoV-2, determine which tissues are susceptible to infection and therefore the potential symptoms COVID-19. The table shows some of the affected tissues/organs and a few of the associated symptoms. Infected patients with co-morbidities (i.e. existing conditions), such as cardiovascular disease, are generally at greater risk as the affected organs are already compromised. This table is adapted from Wadman et al. (2020)’s A rampage through the body.

To enter a target host cell, SARS-CoV-2 attaches to it using the receptor binding domain (S1; the upper domain) of its S protein. A cellular protease, TMPRSS2 (transmembrane protease serine 2), then cleaves the S protein, which activates the virus’s fusion machinery (the S protein’s S2 domain). The viral and cellular (plasma) membrane can then fuse, resulting in the formation of a channel between the cell and virus interior for the genomic single-stranded RNA to enter the cell cytoplasm.


2. Viral replication

Because the SARS-CoV-2 genome is single-stranded and positive sense, the genome is used as mRNA and translated directly. First, non-structural polyproteins 1a and 1ab are produced from ORF1a and ORF1b. The polyproteins are cleaved by the virus’s two proteases into individual nsps, most of which form the replicase-transcriptase complex’ (RTC). This takes place in double-membrane vesicles, derived from the host rough endoplasmic reticulum, to avoid host detection. The RTC replicates the viral genome, producing copies to package into progeny virions. It also transcribes sgRNAs which are translated into structural proteins (M, S and E proteins) as well as accessory proteins (these are unique to each virus and many of their functions have yet to be elucidated).


3. Viral assembly

The structural proteins are then moved from the rough endoplasmic reticulum, where they were translated, to the endoplasmic reticulum-Golgi intermediate compartment (ERGIC). The newly replicated genome and N proteins are brought together to produce the nucleocapsid, which is also transported to the ERGIC. New virion particles are then assembled in the Golgi apparatus.


Finally, vesicles containing new virion particles are produced and released by exocytosis to infect new cells or breathed out. Electron microscopy has shown that infected cells grow long actin-rich protrusions called filopodia which contain viral particles and facilitate spread of the virus to uninfected cells.


Figure 8: The SARS-CoV-2 lifecycle begins when the S protein binds to the ACE2 receptor enabling entry into the cell [1]. If TMPRSS2 is present, the S protein will be cleaved so fusion of the host membrane and viral envelope [2] occurs on the cell surface. In the absence of TMPRSS2, the virus is endocytosed and fusion [2] occurs in the endosome. Whichever fusion route is taken, the result is the release of viral RNA genome [3] into the host cell’s cytoplasm. Translation of ORF1a and ORF1b [4] is carried out using the host cellular machinery to produce polyproteins, which are then cleaved by viral proteases. One resulting protein is the viral RdRp (RNA-dependent RNA polymerase) that produces copies of the viral genome for new viral progeny and subgenomic RNA (sgRNAs) [5]. sgRNAs are translated into structural proteins [6]. The N protein coats newly produced RNA. New viral particles are assembled in the RER (rough endoplasmic reticulum) and Golgi apparatus [7], then carried to the cell surface [8] in vesicles where they are released. This figure was adapted from Shereen et al. (2020)’s COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses and created using BioRender.


The host immune response

The body’s defence against a viral infection, such as infection by SARS-CoV-2, begins with the innate immune response. This involves physical barriers and general immune defences against the virus, such as the release of cytokines (i.e. a group of proteins involved in intercellular signalling, which help coordinate the immune response).

For example, a type of cytokine called interferons inhibit viral replication and initiate inflammation at the infection site to recruit immune cells. In some cases, the activity of the innate immune response is sufficient in ending the infection. If the infection is not cleared, the slower but more specific and powerful adaptive immune response is triggered. The adaptive immune response involves both cell-mediated and antibody-mediated immunity.


Antibody-mediated immunity refers to the role of B cells in the adaptive immune response. Certain B cells, known as plasma cells, secrete antibodies (immune proteins) specific to viral antigens (i.e. markers used by the body to detect viruses and activate the immune system). Antibodies help eliminate the virus, or other pathogen, in several ways. For example, antibodies specific to the SARS-CoV-2 S protein can bind to it to neutralise the virus (prevent it infecting cells). Antibodies are also used to activate a type of immune cells called ‘natural killer (NK) cells’, which can kill infected host cells. This process is called antibody dependent cellular cytotoxicity (ADCC).

In contrast, the cell-mediated response is carried out by T cells. For example, T-killer cells recognise and kill virally infected cells to prevent further replication and spread of the virus. T-helper cells are essential for co-ordination of the immune response and stimulation of plasma cells to release antibodies.


Once the infection has been cleared by the adaptive immune response, most B and T cells specific to the invading virus will die. However, some will remain as ‘memory’ B or T cells - these are specific to the virus and remain in circulation, ready to carry out a faster and stronger response upon re-infection. With that in mind, vaccinations rely on this principal of ‘immunological memory’.

We know that when a virus is detected by the immune system, an anti-viral response will be mounted in attempt to clear it. This will involve the production and release of ‘pro-inflammatory’ cytokines by a range of immune cells, including NK cells, B and T cells and macrophages.


Firstly, pro-inflammatory cytokines initiate inflammation, the body’s response to potentially harmful triggers such as infection or injury, which attracts white blood cells to the affected site. Secondly, these signalling molecules stimulate more cells to release cytokine, thus amplifying the inflammation. This process takes place in response to SARS-CoV-2 infection, resulting in inflammation of the lungs. However, in some COVID-19 patients the huge increase in cytokines leads to a ‘cytokine storm’, resulting in uncontrolled inflammation and subsequently a mass influx of immune cells in the lungs. This can cause significant lung damage, multiorgan failure, in some cases, acute respiratory syndrome (ARDS). COVID-19 patients with ARDS have a survival rate of around 25%, making it one of the main causes of death from the disease.


Concluding remarks

Hopefully, this article has taught you about the biology of SARS-CoV-2 on a phylogenetic, structural and genomic level. Using this knowledge, we then explored the stages each virion particle goes through when infecting cells and replicating, as well as briefly outlining the host immune response and how it goes wrong in some COVID-19 patients.

Grasping these fundamentals is essential for understanding some of the responses being taken in the fight against the COVID-19 pandemic. These include measures to limit exposure to the virus, potential therapeutics to treat severely ill patients and the development of a vaccine to hopefully prevent infection.


In the next article, we will look in some detail at the various drug options and vaccine candidates currently being researched by scientists.

Author: Ambar Khan, BSc Biological Sciences


Disclaimer: All figures created using BioRender are intended solely for educational purposes and not for profit.

bottom of page