Still looking at the beginning — Can a well-formulated Bayesian analysis help inform the Covid-19 pandemic origin debate?

Revision

Elliott Brunner, MD
25 min readMar 7, 2021

The title of this story needed a question mark in all fairness I realized. The problem is that it posits a binomial choice between two propositions — that the origin of the pandemic was either zoonotic or laboratory-derived with regards to its beginning in Wuhan—and thus cannot account for other possibilities. Just as residents of Wuhan were free to roam the world and spread SARS-CoV-2 far and wide before the complete lockdown of the city, Chinese citizens were able to travel into Wuhan from anywhere else in China up until that point. So the travel histories of the initial patients with SARS-CoV-2, whether recognized or not, becomes important. And, of course, considering these multiple possibilities as to the origination in Wuhan of the initial outbreak, the answer to the question the title poses is no. Still, as an example of how to do a Bayesian analysis, this article has didactic value.

The origin debate can continue without this Bayesian argument. That the origin of the pandemic was either zoonotic or laboratory-derived still remains, but this Bayesian analysis, it seems to me, cannot answer why the outbreak started in Wuhan in spite of the surprising coincidence, or which of these two origins was responsible. The truth will have to come from somewhere else. I have also corrected below one argument that was left incomplete in the original version.

As of March 3, 2021, according to world-o-meter there were 115,725,113 global cases of Covid-19 and 2,570,013 deaths recorded. The world-wide toll of this coronavirus pandemic in suffering and economic loss seems almost unimaginable. Although SARS-CoV-2 is now known to cause the disease Covid-19, there is still no consensus as to the origin of this virus after more than a year of vigorous research coming at the issue from every conceivable angle. In essence the question of origination may be reduced to whether this pandemic virus was the result of natural processes, similar to what was seen in the 2 previous human coronavirus epidemics, SARS in 2003–4 and MERS in 2012–14, or whether there was man-made involvement, as in the virus being brought into a laboratory situation, possibly manipulated with what has become exceedingly sophisticated genetic engineering techniques, and somehow subsequently escaped, presumably accidentally, into the surrounding population, seeding infection, and perhaps starting this current pandemic.

The Chinese government, the WHO, and many researchers and scientists around the world have made the case repeatedly that this coronavirus is a naturally derived zoonotic entity; a virus evolved by normal natural selection through one or more animal hosts in such a way until it became able at a certain point to make the interspecies jump into humans. A different group of virologists, geneticists, and other academics have supported contrasting ideas. Their publications exist, for the most part, outside the mainstream consensus peer-reviewed literature. That does not necessarily mean that they are wrong. It is not within the scope of this article to review this complex topic except to analyze the simple question of a surprising coincidence of location and to see if that may be addressed in a quantifiable manner.

It is left to others to review the more important considerations; whether this virus had an intermediate host species, why it was not found in bats anywhere near the locale where the outbreak started, why it has not shown antibody evidence that it was circulating in humans before it was fully adapted to cause disease, why it has not shown any evidence of posterior diversity among the thousands of genomes sequenced to date, how to explain distinctive features in its genetics, like a furin cleavage site, that may not have been possible to acquire in the wild, and finally whether certain patterns in its genetic code suggest that it may have been genetically engineered. In addition, the seminal publication (1) in the field that proposed the closest relative to SARS-CoV-2 to be the bat sequence RaTG13, recovered in 2013 in a cave in Tongguan town in Mojiang county, which established the zoonotic bona fides of SARS-CoV-2 in the eyes of many in the virologic scientific community, has been noted to have discrepancies and possibly fabricated data. These are all serious allegations and, considering the consequences of a man-made origin, must be fully addressed, rather than simply dismissed as conspiracy theory. (See ref. 2 for an example of this type of discussion.)

Much of this research, whether in support, or in refutation, of the natural origin hypothesis for SARS-CoV-2, is quite technical and may not be easily understood by lay persons, who are just trying to grasp why this happened. However, there is one approach (3) whose elements of reasoning should be comprehensible to all. The argument hinges on this surprising coincidence of the location of the first cases of what would become known as Covid-19 occurring very close to the premier virology research institute for coronaviruses, especially SARS-related coronaviruses, in the entire world. This laboratory complex is located a mere 3 kilometers from the People’s Liberation Army Hospital, where these initial patients were treated. If the possibility of catching this virus from a presumed natural source is small enough in the city of Wuhan, then there is little alternative but a laboratory origin. The numeric way one goes about evaluating this kind of probability is called Bayes’ theorem.

In summary, we will try to show that the possibility an ancestral bat coronavirus might have been collected in the wild and perhaps genetically manipulated in a laboratory to make it more infectious, training it to infect human cells, and then releasing it, presumably by accident, in Wuhan, China is an idea that lies within the realm of possibility. In fact, we will show that this likelihood is 69%. The Wuhan Institute of Virology located near Wuhan city center, a city with a population of over 11 million inhabitants, would then have been the source of field specimen collection efforts, laboratory genetic manipulations, and subsequent escape. Additional support for this laboratory origin hypothesis has been directly (but inadvertently) provided by leading Chinese scientists themselves, like Dr. Zhengli Shi, head of coronavirus research at the Wuhan Institute of Virology and Gao Fu (George Fu Gao), Director of the Chinese CDC, by the Chinese government, as well as by scientists who strongly favor a natural origin, like Dr. Peter Daszak, of EcoHealth Alliance.

Viruses are strange creatures that live in the twilight between living organisms and inanimate matter. They are the ultimate parasite; they have no metabolism, no mobility, no organs, and no higher functions. Essentially, they are nothing but a little coating enclosing a bit of genetic material, either DNA or RNA. They must somehow reach and enter a living cell, whether that is a bacterium or a human being, to hijack its cellular machinery to reproduce, making more copies of itself that can go on to infect other similar cells. That is all it can do; life reduced to its absolute simplest form, reproduction. Some scientists would debate whether we should indeed call that living. If one has the genetic sequence of a virus, a long list of the 4 nucleotide letters making up its code — available in gene databases accessible worldwide — the virus may be recreated in a lab by synthesizing and placing that material into a suitable cell culture to regenerate the virus itself, something that can be done with no other class of organism.

Progress in genetic engineering has enabled scientists to do experiments with the genetic material of organisms that are astonishing and worrisome in equal measure. The U.S. invented this technology in the 1970’s; Stanford professor Paul Berg, who is called the father of genetic engineering and who would win a Nobel Prize in 1980, used newly discovered restriction endonucleases to insert for the first time a piece of foreign genetic material into the genome of a virus, thus creating the first artificial mutation. (A small personal side note — at the time I was working in the UCSB laboratory of John Carbon, who would leave on sabbatical to collaborate with Berg on this research; I would leave research to pursue a career in clinical medicine.)

The U.S. has had vigorous bioweapons / biosafety research programs (these are practically the same concepts; research which facilitates, explores, tests one also enables the same with the other — the intention of the program is the key) dating back to the start of the Cold War, working in more conventional ways with dangerous pathogens. This research is done in special facilities designed to protect the community and workers from exposures and infections. These laboratories are designated biosafety (BSL) level 3–4 and the number of these facilities has grown tremendously over the years. BSL-3 facilities are much more prevalent; the U.S. alone has around 1,500 individual (12) BSL-3 laboratories today (see photo below top for the U.S. and bottom for China). BSL-4 laboratories have increased in number from 12 in 1990, 17 in 2000, 42 in 2010, to 52 after 2012 worldwide. The U.S. itself now has around a dozen BSL-4 laboratories; China has two, with plans for 5 to 7 by 2025.

See https://www.gao.gov/products/gao-09-574#summary_recommend
See Demaneuf and Maistre (6), in “Outlines of a probabilistic evaluation of possible SARS-CoV-2 origins

The Wuhan Institute of Virology staff is seen here proudly inaugurating (see photo below) the first BSL-4 lab in China (note that working with coronaviruses only requires a BSL-3 certification, even for GoF experiments):

Wuhan Institute of Virology BSL-4 opening 2018

Experiments on potentially pandemic pathogens such as influenza, SARS, and SARS-CoV-2 are currently authorized in BSL-3 laboratories (see graphic below on top). The growth of this type of infrastructure has primarily occurred in urban settings, so the number of persons exposed within a certain radius of these facilities has skyrocketed (see graphic below on bottom):

Classification of BSL levels
Global population living in the immediate vicinity of BSL-4 laboratories since 1990

With the advent of genetic engineering, what are called gain of function (GoF) experiments are being performed now routinely worldwide. That this research can be problematic is attested to by a 3-year moratorium placed on this type of research in the U.S. enacted in 2014; this Obama administration pause was lifted in 2017. The implications and potential dangers of laboratory escape of one particular pathogen of pandemic potential, SARS-CoV-2 is now proven. The justification given for GoF research is that it will further our understanding of these pathogens, while preparing us for the next pandemic, something that obviously did not happen with SARS-CoV-2. If somehow this research instead endangers us, or perhaps even created this pandemic virus. The latter would be an unprecedented situation in human history which ought to be explored if even remotely conceivable, given the consequences the entire world has faced with the SARS-CoV-2 pandemic and is still facing today. In that case, we would expect those who might have been directly involved and responsible to be concealing the facts in every way possible and imaginable.

Scientific progress and technological advances have always been a double-edged sword, although mankind’s potential for self-destruction seems to be accelerating alongside his technological prowess. On August 6, 1945 the first atomic bomb was dropped on Hiroshima, eventually killing 202,118 civilians. In total, the death toll for the atomic age stands at less than 300,000. If the age of biologic weaponry has arrived, that number will have been vastly exceeded. We need to know if that is the case or not. Scientists are only human. When their work has little direct impact on society at large, science can advance quietly and peacefully. At other times, powerful interests and politics may be used to control the scientific agenda. No matter where one comes down on the issue of climate change, it is easy to see this dynamic at work in something like the Climategate emails. With the geopolitical competition of the century heating up between the United States and China, the coronavirus pandemic may only be a shot across the bow. While progress in computer-related technologies has been amazing us for decades (Moore’s law, below top), advances in genetics and bioengineering have been proceeding at an even greater rate (below on bottom) and promise to perhaps exceed the former in ways we can barely imagine now:

Moore’s Law
Exponential decrease in cost of genetic sequencing

Einstein was a pacifist his entire life. Yet it was he who signed the letter written by fellow physicist, Leó Szilárd, sent to Franklin Delano Roosevelt to convince him to go forward with the Manhattan Project. Soon after that, Einstein and fellow physicists founded the Bulletin of the Atomic Scientists, whose Doomsday Clock now stands at 100 seconds to midnight, the closest it has ever been since it was started. Let us hope that with the pace of technology accelerating, this timepiece will start to experience Einstein’s Special Theory of Relativity time dilation and slow down before it is too late:

Time according to the Bulletin of the Atomic Scientists’ Doomsday Clock

Bayes’ theorem offers a rational technique for revising an initial belief in light of new evidence.

The equation for Bayes’ theorem (11) is

where

And where

· H is the statement of the hypothesis one seeks to evaluate

· P(H) is the initial probability that the hypothesis is true independent of the evidence

· E is the evidence being used to revise the belief in the hypothesis

· P(E) is the probability that the evidence is correct independent of the hypothesis

· P(E|H) is the probability that the evidence is correct if the hypothesis is true

· P(E|^H) is the probability that the evidence is correct if the hypothesis is false

· P(H|E) is the probability that the hypothesis is correct if the evidence is true, which is the Bayes’ theorem result we are looking for

P(E) can be difficult to determine, but the identity P(E) = P(E|H) * P(H) + P(E|^H) * P(^H) always holds and is easier to calculate.

The 3 unknowns that need to be estimated are P(H) the initial probability that the hypothesis is true independent of the evidence, P(E|H) the probability of the evidence occurring if the hypothesis is true, and P(E|^H) the probability of the evidence occurring if the hypothesis is false. Estimating the two conditional probabilities, P(E|H) and P(E|^H), is generally easier than estimating the unconditional probability, P(E), which can be determined from P(E) = P(E|H) * P(H) + P(E|^H) * P(^H). Finally P(^H) is the probability the hypothesis is false, which is the same as 1 - P(H).

The hypothesis, H, is that this pandemic outbreak of SARS-CoV-2 in humans started with the escape of the virus from a virology laboratory in China. Laboratory escape excludes the consideration that a pathogen might have been deliberately released, because that is unknowable without relevant specific facts. The hypothesis does not refer to a particular city, nor to any specific laboratory. The hypothesis only supposes the scenario of a human outbreak of a new type of coronavirus in China in early 2020. P(H) is the starting probability that the hypothesis is true independent of the evidence. In estimating this initial probability, one ignores information about the relative locations of the Wuhan Institute of Virology and the outbreak since that insight will be contained in the statement of evidence.

The evidence, E, is that the first recorded outbreak of SARS-CoV-2 in humans occurred in the city of Wuhan, which is home to the world-renowned virology laboratory, the Wuhan Institute of Virology, which actively researches closely related viruses.

Estimation of the likelihood of the hypothesis, P(H)

A laboratory sourced outbreak is certainly within the realm of possibility. There are two sequential steps to this compound event. First, a pathogen housed in a laboratory, whether in a test tube, cell culture, or within the body of an animal, escapes the confines of the building. This escape may occur in a variety of ways. There might be an incident of leakage from waste mishandling. Samples of the pathogen thought to be inactivated might be shipped to other facilities. Many other mishaps have been recorded and underscore the importance of human error, which has been shown impossible to eliminate no matter how extensive and meticulous the laboratory processes and procedures are. The acquisition and transport of the organism to the facility might also be an occasion for mishap, but typically would not be an event that is well documented. However, the most common path to escape is a laboratory acquired infection (LAI), where a laboratory worker contracts an infection from an organism that is housed in that laboratory. If an exposed laboratory worker is initially unaware of the exposure, these types of exposures are classified as undetected or unreported laboratory-acquired infections (uuLAI’s). (4) The infected worker then becomes contagious outside of the laboratory, returning home unawares and able to transmit the infection to others. These types of infections represent a release into the community, so they are key data for analyzing the risk of a community outbreak. Other incidents involving errors, equipment failures, animal bites or needle sticks, and breeches of protocols can be diagnosed, evaluated, treated, and mitigated, and so fall into a category of LAI’s where there should not be the same risk of community outbreak. The second step in the chain of events of an unsuspected or unrealized laboratory outbreak is then the transmission of infection from the first infected individual to others and the establishment of a localized outbreak with the potential for broader community spread and eventually the possible origin of an epidemic, or even a pandemic. The approach that will be used here is to examine historical accidental laboratory viral releases that have resulted in localized infections.

The probability of release into the community. In an analysis circulated at the 2017 meeting for the Biological Weapons Convention, a conservative estimate shows that the probability is about 20 percent for a release of a mammalian-airborne-transmissible, highly pathogenic avian influenza virus into the community from at least one of 10 labs over a 10-year period of developing and researching this type of pathogen. This percentage was calculated from FSAP data for the years 2004 through 2010.

(4) Human error in high-biocontainment labs: a likely pandemic threat by Lynn Klotz, February 25, 2019

According to Jarunee Siengsanan-Lamont and Stuart D. Blacksell (5), who did a review of laboratory acquired infections in the Asia-Pacific region in their article, there were 27 LAI’s investigated between 1982 and 2016, with a slight majority of LAI’s occurring in research laboratories. These reported 27 LAI’s between 1982 and 2016 represent a frequency of 27 events over 34 years, or 0.8 events per year, although the non-reporting of LAI events was acknowledged as a potential issue in the region. These LAI’s involved a variety of pathogenic organisms and various types of facilities, from clinical laboratories to research institutions. Adjusted to only count research laboratories, this rate becomes 0.4 events per year. Although a heterogenous list of LAI’s will not inform the risk of a particular type of pathogen released from a specific setting and leading to a community outbreak, it does demonstrate a two orders of magnitude greater incidence of LAI’s for the entire Asia-Pacific region than the estimate below of a specific coronavirus LAI from a specific laboratory complex leading to a localized outbreak.

Demaneuf and Maistre (6), in “Outlines of a probabilistic evaluation of possible SARS-CoV-2 origins” state that, “considering the three BSL-3 lab-complexes that were actively working on coronaviruses over the last few years,” they estimated the probability of a laboratory related accident at approximately 0.6% per year. An isolated LAI, or even more than one LAI, might not lead to a community outbreak. Demaneuf and Maistre use the initial value of R0 of 2.2 for SARS-CoV-2, to update the probability of a LAI resulting in an outbreak from the estimate of Merler (7) et al (Merler, Ajelli, Fumanelli, Vespignani “Containing the accidental laboratory escape of potential pandemic influenza viruses” Nov 2013) which then becomes around 50%. Combining these two estimates into the sequential steps leading to a localized outbreak, that probability of a laboratory sourced coronavirus outbreak equals 0.3%.

Estimation of the conditional probability of the evidence if the hypothesis is false, P(E|^H)

If a laboratory were not responsible for the uncontrolled release of SARS-CoV-2, what then is the chance that the first recorded outbreak of the virus would occur in Wuhan, home of the Wuhan Institute of Virology?

The first observation that needs to be stated is that this probability might be indeed be quite negligible, given that Wuhan is a city of 3,280 square miles with 11 million inhabitants in a built urban environment (see photo below) that is far removed from anything resembling the natural habitats of bats, the only reservoir species for all coronaviruses. In this case the laboratory origin of Covid-19 becomes an almost certainty, simply by considering the possibility that acquiring a zoonotic coronavirus infection in downtown Wuhan could be vanishingly small.

Wuhan during the Coronavirus lockdown

Indeed, when Drs. Shi and Daszak (8) needed negative controls for a zoonotic coronavirus seroconversion study, they used Wuhan residents. “As a control, we collected 240 serum samples from random blood donors in Wuhan > 1,000 km away from Jinning, where inhabitants have a much lower likelihood of contact with bats due to its urban setting. As expected, 0/240 samples from the patients from Wuhan had positive serological evidence of prior coronavirus infection. The 2.7% (6 of 218) seropositivity for the high-risk group of residents living in close proximity to bat colonies suggests that spillover is a relatively rare event; however this depends on how long antibodies persist in people, since other individuals may have been exposed and antibodies waned.” In this paper from 2018, Drs. Shi and Daszak concluded that bat-to-human transfer is relatively rare, even for high-risk people living near colonies, and much less likely in Wuhan, a conclusion that does not support the hypothesis of bat-to-human transmission in urban environments.

The negative predictive value for the residents of Wuhan to not show bat coronavirus seroconversion is 97.37% with a 95% confidence interval of 97.31%-97.43%. Converting this P value of p = 0.0236 into an odds ratio, this reduces the likelihood of a Wuhan resident acquiring a zoonotic coronavirus by 97.37/2.36 or 41-fold compared to the inhabitants of Jinning County, Yunnan province. Taking the seropositivity of these Yunnan residents as markers of past coronavirus infection, a 2.7% prevalence rate reduced by a factor of 41 for the population of Wuhan would give Wuhan an upper bound of prevalence of 0.065% at the same 95% confidence limit. If one considers that the antibody levels shown by the seroconversion will only last so long, this systematic error on the estimate might make the prevalence in Wuhan slightly larger.

See https://www.medcalc.org/calc/diagnostic_test.php

When Dr. Zhengli Shi was called to return quickly from a meeting that she was attending in Shanghai at the start of the outbreak in Wuhan, she said, “Could this have come from our lab? I wondered if [the municipal health authority] got it wrong. I had never expected this kind of thing to happen in Wuhan, in central China.” (9) Dr. Shi seemed to appreciate the surprising coincidence of this juxtaposition of locations.

We will take 0.065% for the chance that the first recorded outbreak of the virus would occur in Wuhan if a laboratory were not responsible for the uncontrolled release of SARS-CoV-2 from the above analysis, realizing that this value could just as likely be zero, as Dr. Shi appears to suspect above.

Shi followed later with, “The novel 2019 coronavirus is nature punishing the human race for keeping uncivilized living habits. I, Shi Zhengli, swear on my life that it has nothing to do with our laboratory,” writing on Chinese social media. “I advise those who believe and spread rumors from harmful media sources… to shut their stinking mouths.”

Estimation of the conditional probability of the evidence if the hypothesis is true, P(E|H)

If there were an uncontrolled release of a virus from a laboratory, what are the chances that the resulting outbreak occurs in the same city as the laboratory itself?

If the disease caused by the virus had an incubation period of months or years, instead of days, or the infectious agent were primarily transmitted by contact with bodily fluids, such as during relatively infrequent episodes of sexual contact, one could imagine scenarios where a laboratory worker catches the virus in a laboratory in one city and then travels to another city, instigating an outbreak far removed from the lab. For diseases of this kind, you might assign a low to moderate value of P(E|H) to reflect the possibility that the infection might escape detection in the vicinity of the laboratory where the laboratory acquired infection first occurred.

In other cases, such as this one, where the infectious agent is known to be highly contagious and the incubation period is relatively short, the chance that first outbreak occurs in the immediate vicinity of the laboratory where the accidental infection first occurred is much greater.

For patient zero, in this case a BSL-rated 3 research laboratory employee, to seed an initial outbreak of infection away from the home city where she works, she would need to travel outside of the city before the start of the contagious period of the infection. For SARS-CoV-2 from exposure to infection onset, whether asymptomatic or ill to whatever degree, the median incubation period is 4 to 5 days, with a range of from 2 to 14 days. Basically, she would need to go on vacation just as the unnoticed infection was about to start. In China rest days (aka weekends) are one 24-hour period per week and statutory holidays are between 1 to 3 days in length. Chinese leave policy is not generous by comparison to most Western companies. Annual leave is based upon the number of years worked and becomes maximum at 15 days off after 20 years of employment. Counting vacation time at the longest duration of 2 weeks, this window of transmission represents 4% of the year during which time our infected employee could leave the city for a long enough vacation to be able to seed her inapparent infection elsewhere than her hometown. That leaves 96% of the rest of the time to start a potential outbreak right in the city where she lives. For other occasional leaves from work, such as weddings, etc., we can add a percentage point, which will allow an extra 2 weeks of leave every dozen years or so.

Since P(E|H) is only asking, given an uncontrolled release of a virus from a laboratory (LAI), what are the chances that there is a local outbreak as a result. Once she is infected then what are the chances that she will seed a local outbreak? We can use the same adjustment that we used previously for P(H). As before, Demaneuf and Maistre (6) used the initial value of R0 of 2.2 for SARS-CoV-2, to update the probability of a LAI resulting in an outbreak from the estimate of Merler (7) et al (Merler, Ajelli, Fumanelli, Vespignani “Containing the accidental laboratory escape of potential pandemic influenza viruses” Nov 2013) which they find is slightly greater than 50% for the “reference scenario,” where various interventions (contact tracing, household quarantines, school/workplace closures) are set to reasonable initial values. Combining these two estimates into the sequential steps leading to a localized outbreak, the probability of a laboratory sourced coronavirus outbreak in Wuhan becomes 95%*50%. P(E|H) will thus be set to 48%.

From “Containing the accidental laboratory escape of potential pandemic influenza viruses” Nov 2013

In conclusion

What is important to note is that for 2 of the 3 probability estimates, P(E|^H) and P(E|H), that enter the final Bayes’ analysis calculation, the arguments and logic revolve around considerations unrelated to the biology of coronaviruses; or make use of now universally acknowledged facts regarding the virus, such as its contagion period. The estimate for P(E|^H), or what is the chance that the first recorded outbreak of the virus would occur in Wuhan if a laboratory were not responsible for the uncontrolled release of SARS-CoV-2, used a seroconversion comparison sampling study between Jinning and Wuhan to establish an odds ratio reduction for the population of Wuhan. While an antibody test can be a reliable indicator of past infection, for SARS-CoV-2 it is not yet apparent how long these antibodies might persist. Although the size of the study was small, the negative finding of no antibodies in the test population of Wuhan made the negative predictive value highly significant, even with only 2.7% positives in the reference group from Yunnan.

P(E|H), the chance that an outbreak occurs in the same city as the laboratory itself if there were an uncontrolled release of a virus, needed first to consider the period of contagiousness of the virus and the typical times during which a laboratory worker with an unapparent or unrecognized exposure might leave home to travel outside of the city. Once an infected worker leaves the confines of the laboratory, a complex series of epidemiological steps (7) must occur to seed an outbreak in the local area. This will depend most importantly upon R0 and the local public health response to the initial cases. Here the reference scenario of responses to the outbreak is assumed. Other possible routes of laboratory escape such as accidents involving hazardous waste or other leaks from a secure biosafety facility that do not involve a LAI have not been considered here because that evidence is sparse, and the frequency of these events is not easily quantifiable.

The one remaining Bayesian variable is P(H), the initial probability that the hypothesis, the pandemic outbreak of SARS-CoV-2 in humans started with the escape of the virus from a virology laboratory in China, independent of the evidence that the first recorded outbreak of SARS-CoV-2 in humans occurred in the city of Wuhan, home to the world-renowned Wuhan Institute of Virology, which actively researches closely related viruses. Evaluating P(H) required more specificity regarding the behavior of the virus to produce this estimate. Here 3 papers were used to develop a quantified plausible answer. First, it was important to put a number on the rate of occurrence of any LAI in the geographic region. Next, we needed an estimate that was narrowed down to only coronavirus accidental releases from BSL-3 facilities. Finally, we needed a way to estimate how often a coronavirus leak would go on to seed a local outbreak of infection. For this we used the SARS-CoV-2 value of R0 from the beginning of the pandemic, combined with data from a potential pandemic influenza study (7) of accidental laboratory escape, to inform the result. Whether other studies in the literature might be more applicable and provide a better estimate for P(H) remains for further work to uncover.

The calculation of Bayes’ theorem

P(H|E) = P(E|H) * P(H) / P(E) where P(E) = P(E|H) * P(H) + P(E|^H) * P(^H)

With the above described parameters the result becomes

P(H|E) = 48* 0.3 / ( 48* 0.3 + 0.065 * 99.7 ) = 14.4/ ( 14.4+ 6.5 ) = 69%

Discussion

What is striking in this demonstration of the application of Bayes’ theorem to the consideration that this current SARS-CoV-2 pandemic may well have had a laboratory origin is that mostly simple observations and assumptions are used, in combination with several findings from research on a few aspects of the biology of this virus, or in one case, another virus considered of high pandemic potential. Given these rather simple premises and the seemingly established opposite conclusion, it is perhaps surprising that the answer affirms in favor of the probability of the laboratory origin hypothesis, with a value of 69%.

There have been 3 other papers that have examined the same application of Bayes’ theorem to this problem:

A Bayesian Analysis Of One Aspect Of The SARS-CoV-2 Origin Story — Where The First Recorded Outbreak Occurred

Outlines of a probabilistic evaluation of possible SARS-CoV-2 origins

Why China and the WHO Will Never Find a Zoonotic Origin for the COVID-19 Pandemic Virus

The Jon Seymour paper (11) seeks to provide a wide range of possibilities so that essentially any reasonable estimate of these probabilities will be encompassed in the values. In providing options for the ardent believer, ardent skeptic, and a middle-of-the-road observer for, or against, the hypothesis of a zoonotic vs a laboratory origin for the coronavirus, he is sure to make everyone happy, without however shedding much light upon the answer. But that perhaps was not his goal, as he states, “[I have] presented a Bayesian analysis, with skeptical priors, and a plausible range of likelihood estimates that should encompass the positions of most skeptics, believers, and neutral observers. By tabulating these values and the ratio between P(H|E) and P(H), it is possible to see how important the fact the first outbreak occurred in Wuhan is to a rational revision to any prior belief about the likelihood of an uncontrolled release from a laboratory.” Here is a graphic representation of his calculations:

Jon Seymour calculated a cube of estimates for P(H|E) as shown here

The Demaneuf and de Maistre paper (6) is an extensively sourced and detailed analysis using their appraisal of specialized Chinese literature. They show that their “underlying estimate for the probability of lab-acquired infection is consistent with risk assessments from Chinese authorities and specialists.” They also show that “the relative probability of a lab-related accident against a non-lab related zoonotic event is not negligible across a wide range of defensible input probabilities.” They evaluate a relative probability of 55% for a laboratory related event, compared to a maximum of 45% for a zoonotic origin. Their result of 55% is reasonably close to the value found here of 69% and provides addition affirmation of the validity of this probabilistic approach. In addition, they discuss a series of common misconceptions that are often repeated regarding this type of analysis.

The final Bayesian probability paper (10) is by Jonathan Latham and Allison Wilson, of the Mojiang miners passaging proposal. An important factoid that they point out seemingly for the first time is the concentration in research area of the Wuhan Institute of Virology to focus almost exclusively upon SARS-related coronaviruses. Among the 28 species types of alpha and beta coronaviruses there are only 6 human coronaviruses known prior to SARS-CoV-2. As its name indicates, this pandemic virus is most closely related to SARS. Since a new zoonotic virus emerging randomly from natural evolution would come by chance from any one of these 28 coronaviruses lineages, the fact that it is a SARS-like cousin, which is precisely the main research interest of the laboratory that not only is the world’s leading coronavirus facility, but also located within a couple of kilometers of where the first infected patients with this virus appeared, is indeed a surprising coincidence that is quantified here. By taking the ratio of the population of Wuhan compared to the world (1 : 630) and multiplying by the virus selection possibilities (1 : 28) they arrive at a surprise factor of 17,640 to 1. Lastly, they point out the lack of a zoonotic origin theory and mention briefly that there are 4 distinct laboratory origin theories.

References

1. Zhou, P., Yang, XL., Wang, XG. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273 (2020). doi.org/10.1038/s41586-020-2012-7

2. Quay MD PhD, Steven Carl. (2021, January 29). A Bayesian analysis concludes beyond a reasonable doubt that SARS-CoV-2 is not a natural zoonosis but instead is laboratory derived (Version 2). Zenodo. doi.org/10.5281/zenodo.4477081

3. Bayes’ theorem, from Wikipedia

4. Lynn Klotz, Human error in high-biocontainment labs: a likely pandemic threat, Bulletin of the Atomic Scientists, February 25, 2019

5. Siengsanan-Lamont J, Blacksell SD. A Review of Laboratory-Acquired Infections in the Asia-Pacific: Understanding Risk and the Need for Improved Biosafety for Veterinary and Zoonotic Diseases. Trop Med Infect Dis. 2018;3(2):36. Published 2018 Mar 26. doi:10.3390/tropicalmed3020036

6. Demaneuf, Gilles, & De Maistre, Rodolphe. (2020). Outlines of a probabilistic evaluation of possible SARS-CoV-2 origins (Version 1.0.1). doi.org/10.5281/zenodo.4057129

7. Merler, S., Ajelli, M., Fumanelli, L. et al. Containing the accidental laboratory escape of potential pandemic influenza viruses. BMC Med 11, 252 (2013). doi.org/10.1186/1741-7015-11-252

8. Wang N, Li SY, Yang XL, et al. Serological Evidence of Bat SARS-Related Coronavirus Infection in Humans, China. Virol Sin. 2018;33(1):104–107. doi:10.1007/s12250–018–0012–7

9. Jane Qiu, How China’s ‘Bat Woman’ Hunted Down Viruses from SARS to the New Coronavirus, Scientific American, June 1, 2020

10. Jonathan Latham, PhD and Allison Wilson, PhD, Why China and the WHO Will Never Find a Zoonotic Origin For the COVID-19 Pandemic Virus, Independent Science News, February 16, 2021

11. Jon Seymour, A Bayesian Analysis Of One Aspect Of The SARS-CoV-2 Origin Story — Where The First Recorded Outbreak Occurred, Medium, January 16, 2020

12. See www.gao.gov/products/gao-09-574#summary_recommend

--

--

Elliott Brunner, MD
Elliott Brunner, MD

Written by Elliott Brunner, MD

Retired physician, Coronavirus curious, biochemistry researcher aeons ago, webmaster

Responses (1)