Who Is Afraid of the Superspreaders?

Over the last months, many publications (in the news, in scientific journals etc.) have claimed that drastic social measures are needed in order to control the spread of the infamous COVID-19. However, a different analysis of the situation may provide us with a more optimistic outlook. Indeed, in a recent paper on superspreaders, we suggest that a simple mathematical perspective on the trends may show that the virus can die out on its own without infecting high percentage of the population. The best way to handle the current crisis is therefore, in our opinion, to focus on the superspreaders and we show mathematically why.

While it is tempting to look for correlations between the biological aspects of the virus and its spread, we attempt to show that this may be unnecessary, for mathematical reasons.

It is estimated by some research that a small fraction (say less than 10%) of the infected population cause almost all the rest of the infection (including those immediately infected as well as their very close contacts, such as family members). These are the above-mentioned superspreaders.

Just as social media influencers are mostly responsible for the spread of trends or as 80% of the world money is in the hands of a 20% happy few, 5 to 10% of the infected individuals would be responsible for 80% of the infections. These individuals play a vital role in the spreading process.

We suggest as a first hypothesis that the reason why these individuals are overly contagious should be attributed to social causes rather than to biological ones: they tend to have a higher number of social contacts (due to their social or professional lifestyle), and are thus more likely to be infected with the virus; in turn, they have higher chances to infect others, both directly and indirectly.

Therefore, one might ask whether one really needs to find a biological explanation to some trends of the spread, or, in other words, whether these facts really do require an explanation at all: if the spread decreases with time due to network/interaction reasons, there is no reason to look for other explanations. It is generally acknowledged that the nature of human interaction is heterogeneous and that the number of contacts behaves according to the Pareto distribution or power law (when the network is scale-free). To put it simply: this happens when few have a high number of contacts while most have a small number of contacts. Those explanations should be considered before seeking differences in the way people sneeze.

Until now, the expected number of secondary infections has been measured through the (in)famous parameter R0, which provides us with the average number of people infected by one infected individual. In other words, it tells us, on average, how many people can one single individual infect. It is generally assumed that the disease will die out on its own in case R0<1, or will cause a large fraction of the population to be infected in case R0>1, until many will become immune (assuming that one infection is enough to immunize the subject, as seems to be the case).

When the disease starts its spread, superspreaders play a vital role, and heavily contribute to the reproduction parameter. One way the disease may die out—much before large fraction is infected—is because those superspreaders are out of commission well before regular spreaders. Being likely to be highly susceptible to the virus, they therefore become immune early. Once their contribution to the reproduction parameter is eliminated, it is quite plausible for R0 to decrease below 1, which will mean that the disease will eventually disappear.

Consequently, if one accepts that individuals have considerably different probability of being infected (and to infect), they should be weighted differently when calculating the global infection distribution. There is indeed a strong correlation between the probability of being infected and the average number of secondary infection—which we claim there is, and which we have demonstrated here and here. If one looks at the distribution through the lens proposed in these papers, one will accept that it is natural for R0 (or possibly another parameter S0 being the effective spread) to change with time, and in particular to dramatically decrease, well before general immunity is reached. To put it differently, the spread could slow down with a much lower fraction of infected individuals than is usually anticipated.

The political efforts to contain the spread have included stringent measures such as general quarantines which have had a negative impact on the economical and social situation. In reality, the reproduction parameter starts high, when superspreaders have their strong affect, and decrease as the spread progress. In reality, the reproduction parameter starts high, when superspreaders have their strong effect, and decrease as the spread progress. This strongly suggests the possibility of the parameter decreasing below 0 much before a large fraction of the population being infected.

Taking measures reducing the superspreaders’ impact would certainly dramatically improve the situation, without causing the substantial disruption to individual rights and economy due to lockdowns globally. For the sake of comparison, if a forest was to be on fire in some un-located points, instead of flooding the whole forest, would it not be more sensible to locate those points first (in this case, the superspreaders) and deal with them locally?

There are various ways to reduce the superspreaders’ effect: for instance, once a small group of potential superspreaders has been identified (these could be hospital, supermarket, school, and public transportation employees), they should undergo frequent COVID tests as well as antibody test. Those proving to be immune should be free to interact with the public. In addition, interaction within small groups such as school population should be limited.

This could have two types of impact: first, it could (hopefully) reduce the effective R0 below 1, in which case the pandemic would stop without hitting a large portion of the population; second, it could decrease the spread rate of the disease. This could in turn allow us more time to weigh out potential reactions and measures.

Before closing our first Corona post, lets us ponder some questions:

– What is the exact goal of the social measures taken, and what is the best strategy to achieve that goal? These two questions are crucial but little discussed, and when debated, the answers widely vary. Let us assume for sake of this discussion the highly likely option that a large-scale immune treatment is years away.
– What should one presume regarding the network of contacts?
– Is R0 the real parameter to take into account in typical networks?
The real challenge is now to examine the valid options at our disposal by looking at the facts without letting our judgement be clouded by unfounded assumptions.

Coincidence? Yes!

One of the themes this blog will focus on is a widespread tendency—including among scientists—to draw correlations or even causation between facts that are seemingly connected but are in fact unrelated. Even in the case of correlated events, correlation is often mistaken for causation. For the sake of clarity, let us note that correlation is any statistical relationship, not necessarily causal. Causation or causality is the relationship between a cause and its effect.

In many cases, phenomena that do not require any explanations are still given one: people (intuitively but wrongly) assume an explanation is required while, in fact, such phenomena are typical scenarios. Deserving no explanation is how we will refer to these phenomena.

Relations of causation and correlation have been a long-time topic of interest in humankind as the famous passage in Amos (3:3) exemplifies:

Can two walk together, except they be agreed?

asks the prophet in a rhetorical question. When two events happen in close chronological or geographical proximity, our brain tends to infer that they must be related, hence the common expression: “Coincidence? I doubt it!” to which our post title refers. In reality, since there are many coincidences that seem odd, the probability of one of them occurring could be high.

A perfect example is the infamous Bible code: serious mathematicians were convinced that some messages lay hidden in the ancient text, revealed only when taking appropriate sequences of jumps through the text. One should start with the question: how many such sequences of jumps are there and is it then so odd that one of those makes a comprehensible message?

Such correlations are foundational of some recent trends, which in some cases are turned into an ideological orthodoxy of sorts. For instance, the anti-vax movement largely relies on assertions such as “people get a vaccination and then get some rare disease”, or “they got the flu vaccine and then they died”, implying that the former must have caused the latter—as if anyone might have contemplated the reverse order…

Other common examples relate to cognitive misunderstandings. For instance, sports commentators often assert that the taller the basketball players, the poorer their eye-hand coordination. Hence, it is claimed that “tall people have weaker eye-hand coordination”; Other pundits contend that “good-looking people are not as smart”.

The reality is that there is clearly a positive correlation between looks and intelligence, and plausibly between height and eye-hand coordination; nevertheless, when considering basketball players at the top level, negative correlation is sometimes wrongly assumed as distinct roles require different traits. Likewise, when watching people in the media, who are featured there either for their good looks or for their intelligence, one may erroneously conclude that the correlation is negative.

This post and our blog in general is much in the spirit of works by Kahneman and Tversky, members of the Federmann Center for the Study of Rationality at HUJI, or the celebrated authors of Freakonomics, who have all pointed to biases that cloud our reasoning because our brain does not process simple statistics well.

For instance, the so-called principle of regression to the mean, on which we will publish more soon, is related to the wrongly assumed negative correlation between looks and intelligence. Another example: some people claim they can eat as much as they like without gaining any weight and without counterbalancing their calorie intake with any calorie burning. People usually react by saying that “it must be good genes” and therefore good health. However, by default, it should suggest that this person is in fact in poor health.

When looking for causes of poor health, the first step is to find some correlation between parameters. Once a strong correlation is found between A and B, there are three options: either one causes the other, or a third parameter C causes both. Alas, the parameters tested for are usually the easiest to measure, and therefore, the third option is the most likely. In our example, over-eating may be the hidden parameter: it usually causes overweight, as well as poor health. Someone who over-eats is likely to suffer ill effects. With no corresponding calorie burning, however, this individual will possibly not be saved by the action of the fat cells which can remedy some of these ill effects.

Another instance of potentially clouded judgement is related to current events and in particular to the spread of COVID-19. The underlying assumption which made most people accept to go in national lockdown is that such measures will improve the general health situation. But should these two phenomena be correlated? Alarmed by the numbers of infected individuals and deaths hammered in the media, most will be persuaded that drastic social measures are necessary, thereby losing touch with the main question : what is the specific purpose of such measures and do they effectively solve the case in point? If the purpose is to stop the spread, one may ask if in the absence of a vaccine, trying to prevent people from getting immune to the virus is a sensible thing to do. If the purpose is to curb the increase in mortality due to COVID, numbers should be analyzed carefully: who should we count as a COVIDvictim? Only those individuals whose death was caused directly by the COVIDor also those who died from another cause while also being positive to COVID? Likewise, hospital overload is a central issue, as many have recognized, and yet, most of the numerous individuals recently infected have had little or no symptoms at all, therefore requiring no hospital care and not contributing to hospital overload.

To sum up, one important red thread of this blog will be to call into question correlations and relations of causation made in public debates which are wrongly based on assumptions and can lead to incorrect conclusions and ineffective actions.

A Mathematician and a Broker with a Black Box Walk into a Bar

We just received the great news that Dor Minzer is the 2019 recipient of the prestigious ACM Doctoral Dissertation Award for his thesis “On Monotonicity Testing and the 2-to-2 Games conjecture”. This is the most important prize for PhD theses globally in Computer Science. The title may not be self-explanatory, however, as theoretical science often remains clouded in mystery for the non-specialist. Yet, the advances made in this work can be disclosed in a less cryptic way.

In this post, we will walk you through the results, taking concrete examples in order to clarify what is at stake in this work. In particular, we will focus on its first part, which is the culmination of a long research process that started two decades ago. It solves a long-standing problem related to the property-testing of the monotonicity of a Boolean function. This may not ring a bell right away so let’s use a concrete case to shed light on the real-life application of property testing in general, and monotonicity and the mysterious black box in particular.

Before we delve into this case, let’s answer the burning question nagging any curious inquisitor of theoretical issues: how is this research helpful? Well, let’s just say for now that Boolean functions, property-testing and monotonicity all come into play in the analysis of large quantities of data needed, for instance, in the study of genetic patterns potentially causing the occurrence of some diseases, or when dealing with virus spreads – whether human or computer-related.

Let’s take a concrete example: imagine you are the CEO of a brokerage company. One morning, a black box lands on your table. It turns out it is a voting machine. Every one of your brokers gets a unique key, and when time comes to make a decision between two investment options, each one in turn inserts their unique key and turns it left or right to choose between the two. Once all brokers have turned in their votes, that black box performs some unknown computation and outputs the resulting choice between the two options. In other words, its process is a voting mechanism: for every one of the 2^n combinations (assuming n brokers), there is a binary outcome. Let’s re-emphasize here that its decision is based on an unknown mechanism and, in particular, it does not necessarily compute the majority function.

Now, let’s assume you would like to test this machine to verify one thing: that the mechanism is monotone, which means that if voters could change their vote and would choose to switch from ‘no’ to ‘yes’, the machine could not and would not change the outcome in the opposite direction. Even if the machine assigns different levels of expertise to distinct brokers, considering their opinion more than others—still, why would you employ a broker if you thought they usually choose the wrong investment option?

The number of combinations for the voters’ choices, however, is prohibitive, since it is exponential in the number of voters. Thus to go over all the voting combinations would take far too long. This is where property testing comes into play: it calls for the design of extremely fast algorithms, which help us analyze huge amounts of data by accessing only a small fraction of this data. Since it is a random, probabilistic process, one must content oneself with an algorithm that does not completely assure us that the property applies—in our case, that the voting box is monotone. What we have instead is an algorithm that would assure us that the machine is with high probability close to monotone: in other words, one can alter the machine slightly—flipping the voting results on only a small fraction of the combinations—to turn it into a monotone voting scheme.

We say a function is far from monotone if one must flip more than 1% of the results to make it monotone. The question is now: how many random queries does one need to probe our machine on to come up with a proof that it is far from monotone? After giving it some thought, one can see that the only way to prove that this little black box is not monotone is to obtain two voting combinations: the first is x (2^n options) and the second, y, is derived from x by flipping some votes in x, all in the same direction. If the mechanism changes the outcome in the opposite direction, it would be a clear proof of its non-monotonicity.

We chose the case of the broker to illustrate our point, although we might as well have taken other examples: genetic patterns or viruses, with which we started this post; in sum, any other instance in which you are given an array of signals with the task to study various settings and arrive at some conclusion. In the case of a virus, the algorithm should provide an indication as to whether an individual is infected or not; in the case of genetics, whether an illness will or will not occur according to a certain genetic pattern. Here too you will need to make sure your machine is monotone, i.e., that changing the pattern will not cause the algorithm to change its outcome in the opposite direction. In those cases, property testing is the solution, and the specific property to test for is monotonicity.

These problems all relate to the so-called Boolean functions, which are at the core of research in computer science. It has kept researchers baffled for quite a while. More than two decades ago, in their paper “[Testing Monotonicity]”, Goldreich, Goldwasser, Lehman, and Ron came up with a property tester, i.e., a very efficient algorithm, that randomly probes the machine in 1000n number of voters combinations and verifies whether the machine is (less than 1% far from) monotone with high probability.

This algorithm works as follows: randomly choose one setting of n votes and probe the machine on it; then change just one vote and probe the machine again on that new setting. One can demonstrate that, if the machine is far from monotone, the probability the machine will obtain evidence of this—that is, that it will change its outcome in the wrong direction—is inversely linear (1/n) in the number of voters n. If one runs this test linearly in n numbers of times, and the black box is far from being monotone, one can demonstrate with good probability that it is not monotone through one of the tests’ failing; on the other hand, if the machine is monotone, it will pass the test with perfect completeness—that is, every time with no exception.

Soon afterwards, in another paper entitled “[Monotonicity testing over general poset domains]”, Samorodnitsky, Fischer, Lehman, Newman, Raskhodnikova and Rubinfeld showed that one cannot design such an algorithm if one limits the number of probes to \sqrt n (square root of the number of voters n).

The problem was therefore left open: can one improve the upper bound to match the lower bound, or the other way around? Some improvement was achieved by Chakrabarty and Seshadhri (“[A o(n) monotonicity tester for Boolean functions over the hypercube]”) in which they created an algorithm that makes n^{7/8} queries to detect non monotonicity.

In our paper “[On Monotonicity Testing and Boolean Isoperimetric-type Theorems]”, our team (Dor Minzer, Subhash Khot and Muli Safra) took up the mission of improving the complexity of the algorithm to match the lower bound, probing the voting machine (number of queries) in only \sqrt n combinations. Although the tester was relatively simple (if only a little tricky), finding the proof of its correctness turned out to be a little adventure through various areas in mathematics, including analysis of Boolean functions, graph theory—and amounted to years of work.

The proof itself is far beyond the scope of this post but in future posts, we will pick and present a few exciting bits that lied on the via dolorosa leading to the proof in question.


Create your website at WordPress.com
Get started