“the CGG CGG coding […] does not appear in nature”, this and the furin cleavage site from the SARS-CoV-2 spike protein gene suggests the virus wasn’t transmitted from animals to humans
Factually inaccurate: The genetic sequence CGG CGG, which codes for a pair of arginines, a type of amino acid, can be found in the genome of several other coronaviruses, not just SARS-CoV-2. This sequence does exist in nature.
Inadequate support: Neither the presence of the CGG CGG sequence nor the presence of a furin cleavage site in the SARS-CoV-2 spike protein are evidence of genetic manipulation. It’s common for viruses to exchange genetic material in a process called recombination. And genetic markers in the SARS-CoV-2 genome indicate that such recombinations already occurred. The furin cleavage site observed in SARS-CoV-2 also exists in other coronaviruses.
KEY TAKE AWAY
The CGG CGG genetic sequence is rare in the SARS-CoV-2 genome, but it can be readily found in nature, in many other genomes. SARS-CoV-2 possesses molecular features, such as a furin cleavage site in its spike protein, which enhance its disease-causing ability. This site is also present in other coronaviruses. The most likely hypothesis so far is that SARS-CoV-2 acquired these molecular traits through spontaneous exchange of genetic material with other viruses.
FULL CLAIM: “the CGG CGG coding […] does not appear in nature”; the CGG CGG sequence and the furin cleavage site from the SARS-CoV-2 spike protein gene suggests the bat to human transmission hypothesis “doesn’t seem accurate”.
The origin of SARS-CoV-2, the virus responsible for COVID-19, has been a topic of intense debate. The available evidence suggests that a zoonotic origin, meaning that SARS-CoV-2 jumped from animals to humans, is the most likely hypothesis[1,2]. Furthermore, zoonotic transmission is common and has been the cause of several epidemics, such as SARS-CoV-1, Ebola and HIV, making this hypothesis all the more likely, due to the high prior probability.
However, throughout the pandemic many claims have circulated that SARS-CoV-2 was in fact bioengineered. Health Feedback reviewed such claims on numerous occasions and found that they were unsupported by available data at the time, see here, here, and here. Claims that the virus was genetically engineered has yet to be supported by scientific evidence.
Nevertheless, this claim resurfaced in April 2022 in a video from American radio host Dan Bongino, which received more than 3 million views on Facebook. Bongino previously propagated unsubstantiated claims regarding SARS-CoV-2 and has been permanently banned from YouTube for spreading COVID-19 related misinformation.
In his video, Bongino declared that the hypothesis that SARS-CoV-2 had jumped from animals to humans didn’t “seem accurate” because the SARS-CoV-2 genome contained a specific genetic sequence, CGG CGG, “that does not appear in nature”. Bongino concluded with the allegation that SARS-CoV-2 was bioengineered. However, his claim is inaccurate. This CGG CGG sequence does appear in nature and its presence isn’t evidence of genetic manipulation of the virus, as we explain below.
What is the CGG CGG genetic sequence?
First, it is important to explain some basic aspects of genetics in order to understand the claim. The SARS-CoV-2 genome contains instructions on how to assemble molecules called amino acids, which become the building blocks for the proteins essential to the functioning of the virus.
In an RNA virus like SARS-CoV-2, the genome comprises four nucleotides: adenosine (A), cytosine (C), guanine (G), uracil (U). The amino acid-making machinery reads the genome in chunks of three nucleotides; a cluster of three nucleotides is called a codon (Figure 1). The combination of the nucleotides in a codon—there are 64 possible combinations—determines which amino acid will be produced (Figure 2). Therefore, the nature and order into which these codons are assembled in the genome will dictate the nature and order into which the amino acids will be assembled and thus, the type of protein being built.
Figure 1. How proteins are produced from the genetic code. Note the trinucleotide clusters (codons) marked on the RNA sequence. Source: U.S. National Human Genome Research Institute.
Figure 2. A table showing how different codons are translated to various amino acids. Source: Nature Education.
As shown in Figure 2, the same amino acid can be produced by different codons. This is also known as the concept of codon redundancy. For instance, the amino acid arginine (Arg or R), can be coded by six different codons: CGU, CGC, CGA, AGA, AGG, and CGG. The genetic sequence mentioned by Bongino is the pair of codons CGG CGG; as the codon table shows, this sequence codes for a pair of arginines (RR).
The CGG CGG sequence appears in nature; several other coronaviruses also carry the same sequence
Bongino claimed that the codon doublet CGG CGG doesn’t appear in nature. However, this claim is inaccurate. Kristian Andersen, a virologist in the department of Immunology and Microbiology at the Scripps Research Institute, pointed out on Twitter that the CGG arginine codon is used in many coronaviruses, albeit at a low frequency.
In an email to Health Feedback, Robert Garry, a virologist at Tulane University School of Medicine, shared with us the results of a database search, showing that several other coronaviruses also contained the CGG CGG sequence coding for a pair of arginines (Figure 31). This was also reported in a preprint—a study that has not yet been peer-reviewed—reporting such CGG CGG sequences in several coronaviruses.
Figure 3. Presence of the CGG CGG codon doublet, coding for a pair of arginines (denoted by the letter R), in several viral genomes. The figure represents a portion of the genome of four different coronaviruses. For each virus, the first line of letters corresponds to the nucleotide sequence, the second line of letter shows the amino acids that would arise from that nucleotide sequence. All four genomes contain a CGG CGG sequence. Source: Robert Garry.
Therefore, contrary to Bongino’s claim, the CGG codon and the codon doublet CGG CGG does exist in nature, in many coronaviruses. Claiming the opposite contradicts the facts.
Natural evolutionary processes can explain the CGG CGG sequence in SARS-CoV-2
In the debate about SARS-COV-2 origins, many comments fed the narrative, that Bongino relayed, that the presence of the CGG CGG doublet and its presence in the SARS-CoV-2 genome was a “smoking gun” for genetic manipulation. However, such an idea is unsupported, as we explain below.
Firstly, CGG CGG is part of a larger sequence—U CCU CGG CGG GC—in the gene coding for the Spike protein of SARS-CoV-2. The spike protein allows the virus to latch onto its target cells and infect them, making it crucial for SARS-CoV-2’s ability to cause disease. This 12-nucleotide sequence adds a specific molecular structure known as a furin cleavage site (FCS) to the Spike protein, which enhances SARS-CoV-2 infectivity[5,6].
While relevant for the ability of SARS-CoV-2 to infect organisms, the presence of this 12-nucleotide, FCS-coding genetic sequence isn’t suspicious. In fact, the same 12-nucleotide sequence already exists in nature: it can be found in the bat coronavirus Bat-HKU9.
Natural genetic mechanisms exist in which a virus spontaneously inserts chunks of RNA from another virus into its own genome during its proliferation in a host[7,8]. This process is called genetic recombination. In fact, coronaviruses have a high propensity for these spontaneous recombinations and the SARS-CoV-2 genome contains genetic sequences indicative of such events. It is thus plausible that SARS-CoV-2 or one of its ancestors acquired that sequence through spontaneous genetic recombination.
Some, including the Nobel Prize winner David Baltimore, first claimed that the CGG codon is rare and implied that its rarity makes its presence at the FCS suspect. However, this is misleading, and Baltimore later backpedaled from his earlier claim.
The arginine-coding CGG codon is indeed the least common of all arginine-coding codon in SARS-CoV-2 genome. Andersen wrote on Twitter that the CGG codon is only used for 3% of the arginine-coding codons in SARS-CoV-2. This is why some considered that the odds to have two CGG codons side by side, right in the FCS, were so low that it could be a signal of genetic manipulation.
However, the existence of the aforementioned genetic recombination mechanisms makes it totally possible to naturally transfer a preexisting CGG CGG sequence from another virus to SARS-CoV-2 all at once, as Jonathan Eisen, an evolutionary biologist in the department of Evolution and Ecology at the University of California Davis, explained.
The furin cleavage site from the SARS-CoV-2 Spike protein already exist in other viruses
Another observation that raised questions about the origin of SARS-CoV-2 is that the SARS-CoV-2 FCS isn’t present in the genome of the known closest relatives of SARS-CoV-2. For instance, SARS-CoV-1, the virus responsible for the SARS outbreak in 2003, and the bat coronavirus RaTG13, one of the closest known relative of SARS-CoV-2, all lack a FCS. This led some to posit that the FCS in SARS-CoV-2 must have been the result of genetic engineering.
However, this isn’t a sign that the FCS was deliberately added to the SARS-CoV-2 genome. First, one must keep in mind that the absence of proof isn’t equivalent to the proof of absence. Very few SARS-CoV-2 relatives are known so far and the ones we know, such as RaTG13, are only distant relatives. Therefore, it is likely that a closer SARS-CoV-2 relative, with a FCS in its genome, is present in the wild and simply hasn’t been discovered yet, as it is estimated that thousands of viruses remain yet unknown.
Moreover, many coronaviruses apart from SARS-CoV-2 have a FCS. Therefore, the presence of a FCS in a coronavirus isn’t in itself surprising or unexpected. In particular, Andersen pointed out that some coronaviruses that infect cats have the exact same FCS as the SARS-CoV-2 spike protein. These examples demonstrate that a FCS can arise naturally and it’s plausible that SARS-CoV-2 acquired a FCS through genetic recombination.
Finally, Andersen also explained that the FCS of the SARS-CoV-2 spike protein is suboptimal: meaning that this specific FCS is less efficient compared to the FCS found in other viruses. If someone had tried to engineer a coronavirus with a FCS, it would make more sense if they had selected the most efficient FCS available. In that regard, Susan Weiss, a microbiologist at the University of Pennsylvania, commented in an email to Health Feedback:
“Many other coronaviruses have furin recognition sites in their spike proteins. Some CoV spikes have more basic amino acids in their furin sites tha[n] SARS-2 or MERS suggesting their spikes will be more cleaved. If someone wanted to engineer in a furin site they would have likely put in more basic amino acids like in OC43 or HKU1 or MHV- the embeco viruses.”
Stanley Perlman, a virologist at the University of Iowa, also told Health Feedback that genetic manipulation to insert a furin cleavage site into SARS-CoV-2 is highly unlikely and scientifically unjustified:
“The whole issue of inserting of a furin cleavage site is dubious, since insertion of such sites is extremely difficult, require the virus to be in hand, require a reverse genetics system and also require that the virus be more virulent after insertion of a furin cleavage site (which is not necessarily true for CoV and not known in advance).”
In summary, Bongino’s claim is inaccurate, as the codon doublet CGG CGG can be found in several viruses and isn’t unique to SARS-CoV-2. Although the CGG codon is rare in virus genomes, its presence isn’t a sign that SARS-CoV-2 was genetically engineered. Among viruses, spontaneous insertion of a whole stretch of RNA into the genome is a common phenomenon and several known mechanisms could explain this observation.
Furthermore, the presence of a FCS in the SARS-CoV-2 spike protein isn’t a sign of bioengineering either, given that many coronaviruses also contain a FCS. Some viruses even have the exact same FCS as the one found in SARS-CoV-2. Besides, the SARS-CoV-2 FCS isn’t the most efficient FCS identified by scientists. Intentionally choosing an inefficient FCS when more efficient ones are already known wouldn’t make sense from the standpoint of genetic engineering. Considering all the above, experts in the field consider that the most likely origin of SARS-CoV-2 is the spontaneous transmission of the virus
“The presence of two adjacent CGG codons for arginine in the SARS-CoV-2 furin cleavage site is similarly not indicative of genetic engineering. Although the CGG codon is rare in coronaviruses, it is observed in SARS-CoV, SARS-CoV-2, and other human coronaviruses at comparable frequencies. Further, if low-fitness codons had been artificially inserted into the virus genome they would have been quickly selected against during SARS-CoV-2 evolution, yet both CGG codons are more than 99.8% conserved among the >2,300,000 near-complete SARS-CoV-2 genomes sequenced to date, indicative of strong functional constraints.”
This is for sure not evidence that SARS-CoV-2 was engineered. In fact there’s not a single piece of evidence that SARS-CoV-2 is anything but a natural virus, previously unknown prior to the pandemic but clearly linked to the same SARS-related coronaviruses that the first SARS emerged from (we discuss this here). It’s also clear that the Huanan market is where this all began; see our preprint.
Robert Garry, Professor, Tulane University School of Medicine:
1. Are there examples of other viruses with a CGG CGG doublet, especially within a sequence coding for a furin cleavage site?
CGGCGG occurs in many coronaviruses, based on a quick database search by a colleague (Figure 3)
2. Is the presence of CGG CGG sequence in a furin cleavage site a clue of a possible genetic engineering?
No, the presence of the CGG CGG sequence in a furin cleavage site isn’t a sign of possible genetic engineering; we wrote about that in this paper.
Stanley Perlman, Professor, University of Iowa:
This claim has been floating around for a long time and has been debunked. People like Kristian Andersen at Scripps have investigated this carefully and have shown that these codons occur naturally within the genome. I would [link to] previous articles by him and others dealing with this issue[2,11].
The whole issue of inserting of a furin cleavage site is dubious, since insertion of such sites is extremely difficult, require the virus to be in hand, require a reverse genetics system and also require that the virus be more virulent after insertion of a furin cleavage site (which is not necessarily true for CoV and not known in advance).
Susan Weiss, Professor, University of Pennsylvania:
This topic is getting very old by now; there is lots of evidence that SARS-CoV-2 came from a bat.
No, [the presence of CGG CGG in a furin cleavage site] does not at all indicate engineering. Many other coronaviruses have furin recognition sites in their spike proteins. Some coronavirus spikes have more basic amino acids in their furin sites than SARS-CoV-2 or MERS-CoV suggesting their spikes will be more cleaved. If someone wanted to engineer in a furin site, they would have likely put in more basic amino acids like in the coronaviruses OC43 or HKU1 or MHV, the embecoviruses.
Kristian Andersen, Professor, Scripps Research:
These claims are false—for background, see this Twitter thread.
The SARS-CoV-2 furin cleavage site is yet again in the news – this time because of a quote by Nobel laureate David Baltimore.
The site is not a “smoking gun”, nor does it “make a powerful challenge to the idea of a natural origin”.
Quite the opposite, so a little science 🧵👇 pic.twitter.com/Txc3sQYZSe
— Kristian G. Andersen (@K_G_Andersen) May 9, 2021