The COVID-19 Genomics UK Consortium (COG-UK) was launched in April 2020, with the aim of sequencing the genome of the COVID-19 virus and its variants and providing data to track and analyse viral transmission within the UK.
What are the main aims of COG-UK?
The overarching aim of the project is to deliver large-scale SARS-CoV-2 sequencing to public health agencies, to inform interventions and policy decisions during the current pandemic; and all the other goals of COG-UK effectively feed into that. This sequencing capacity will enable close to real-time evaluation of novel treatments, vaccines and transmission events like outbreaks or the introduction of new variants into the country. Sequencing is fundamental in identifying the emergence of new viral mutations – we cannot detect them unless we sequence them.
Since the launch of COG-UK on 1 April 2020, it has grown into a consortium of many hundreds of people which supports 16 sequencing hubs across the country, including the four public health agencies and researchers from academic partners across the UK.
How can genome sequencing help us combat the spread of COVID-19? What can genomics tell us about how the virus spreads within specific demographics or localities?
Sequencing is essentially an adjunctive tool: it does not replace the need for hand hygiene measures, social distancing, masks and testing; those are the key things that we have to do, in addition to the various public health interventions such as lockdowns regional tiering, but it adds to our knowledge in terms of transmission. When the SARS-CoV-2 virus first emerged back in 2019, all the examples of that virus looked identical, but over time, a given virus or virus lineage will accumulate around two mutations per month; and that allows the virus to accumulate those changes into a kind of genetic barcode or fingerprint. The virus has around 30,000 bases in its genome, and the mutations it acquires will be in different places in different viruses; so the viruses will start to differentiate from each other and form lineages, which are all variants of the same virus but with different barcodes.
We can then sequence the virus as it appears in different people and compare the genome, the barcode, from one person to another. If the genomes for two people are identical, then it will be likely that they have come into contact with a common source very recently – it could be that the virus has been passed directly from Person A to Person B, or they both got it from Person C, but the forms of the virus will be very closely related; and that could help to identify transmission. If the genomes are quite different between two people, then even if the two people have been in the same place at the same time, we can exclude the possibility of a transmission event.
In a November 2020 study by Dr Estée Török et al, the researchers sequenced the virus from people in hospital; they were particularly interested in where there was nosocomial transmission within the hospital, and every week they would report back to infection control that they had detected some clusters in their sequence data. There was a particularly interesting example in their dialysis unit, where they could see viral clusters among people with renal dysfunction – and they knew that the strains of the virus within that cluster were related, but they did not know how they linked up epidemiologically. Eventually they tracked the source, not to the hospital itself, but to the transport system. People were getting into the same transport vehicle and transmitting it there, rather than on the dialysis unit; and so that stopped as soon as the transport arrangements were changed.
In a way it’s like forensic detective work, looking at who has passed what to whom – we cannot assume directionality, but we have to determine whether an incidence of infection is part of a cluster or an outbreak or not. As another example, if we know that there are several cases clustered together in a school, the question is whether the virus has spread inside the school or been introduced independently. That investigation would have significant implications, because if it has spread in school then that may mean that the infection prevention control measures which are in place are not working; whereas if all the cases have been acquired in the community and brought in, it would be dealt with differently.
There is a general view that the virus has been introduced from hospitals into care homes in the majority of cases, but looking at the body of literature sequencing the virus in care homes, when strains are compared between the community and hospitals, for the most part the virus is coming from the community. That also tells us that when care homes are devising infection control strategies, they need to think about the staff and visitors coming in, because that is how infection is getting in – and then once it is in, it tends to spread quite quickly across the care facility. That knowledge allows us to work out where interventions and preventive measures are not working, and where there might be missing links in the chain.
What is the significance of the newer mutations and variants of the SARS-CoV-2 virus?
Mutations arise naturally, and they are arising all the time in the genome – it is a function of biology and it is going to keep happening. In general, this is not alarming; because it is what we expect. The SARS-CoV-2 virus has accumulated many, many mutations, most of which will not actually have any impact on the overall biology of the virus. The mutations we particularly track are in the spike protein gene, which interacts with the human cell: the spike is the way that the virus gets into the human cell, through the ACE2 receptor.
We are particularly interested in the mutations in the part of the gene that encodes for the receptor binding motif – and even in the spike protein gene, the majority of mutations will not be significant; but a very small number of mutations can change the way in which the virus interacts with humans and could potentially lead to greater infectivity or transmissibility. One key example of this is a mutation called D614G, which occurs at point 614 in the genome and which leads to a change in its amino acids: when the virus first emerged, this mutation was not present at all; we first saw it around March 2020. It has effectively dominated the viral population – everything has this mutation now – and while it does not cause greater severity of the disease, it does make it more transmissible.
Some mutations can cause greater transmissibility and greater spread. Another thing that concerns us is the possibility that mutations can lead to a change in disease severity, so a specific mutation or combination of mutations could lead to more severe disease, meaning that people are more likely to die. The alternative is that mutations could lead the virus to become weaker, so it actually becomes less virulent. To date, however, we are not aware of any set of mutations are associated with the virus becoming more or less severe. The third set of types of mutations are those that lead to changes in our immunological response to the virus: those are the ones that we worry about, because they could affect immune responses generated by previous infection or vaccines. That is the greatest concern, because viruses could emerge which are able to avoid the effect of the immune system by changing the characteristics of the virus. Again, there is currently no evidence of these mutations: there are some signals that some changes are leading to some reduced interaction with specific monoclonal antibodies, but at the moment that situation is not alarming because there is no evidence that the vaccine will not work.
Could the work of COG-UK help preparations for another pandemic in the future?
Yes, definitely – COG-UK has completely changed the landscape of how we do pathogen sequencing. Before the pandemic, sequencing only really took place in reference laboratories; it was not available as it is now. We have sequenced over 140,000 genomes, which is an exceptionally high number. It has changed the landscape, because not only do we have much higher capacity, but that capacity is distributed across the United Kingdom; and so that network would continue to be available to understand newly emerging pathogens. That sets us up for the future, in case we need to turn our attention to new pathogens – it is important to sequence a newly emerging pathogen as soon as possible. The SARS-CoV-2 virus was sequenced in early to mid-January; and that was what enabled vaccine development to start early, because the genomic sequence was very similar to previous coronavirus sequences, so there was enough similarity there to build an understanding of how the genes work.
The other interesting thing about having such a good capacity for sequencing is that it can be turned to focus on other things: we could turn it to the study of antibiotic resistance, for example, or we could look at other pathogens – we sequence foodborne associated pathogens, so that could be in the distributed model. The network is still going to have uses once COVID-19 is under control.
Professor Sharon Peacock
COVID-19 Genomics UK Consortium