It’s easy to lie with statistics.
Quick example: China’s population of over 1.35 billion people is estimated to have a margin of error of 20 million people. The Chinese hukou registration system, used in most regions for counting population, is based on the wrong assumption that being registered in one region automatically means residing there. This was true under the strict control of mobility, but when migrations started, China’s statistics system ignored this and kept the traditional counting system instead of switching to more accurate methods. This led to growth rates per capita being overstated in coastal provinces and understated in the interior. What happened when people realized that such differences in population counting could be used for distorting growth rates? Regions started using different counting methods, respectively what made their growth rates look juicier. Thus, the 20 million people over counting.
Here is another example of how statistics is where you don’t expect it to be and of how harmful statistical illiteracy can be. Los Angeles, 1995. The NY Stock Exchange volume of trade fell by 40% and 100 million people turned on TVs to hear the verdict on the criminal trial of O.J. Simpson, one of the major American football players, for the murder of his ex-wife. The verdict which passed into the annals of history was: not guilty.
The mathematical reasoning used (wrongly) by the “dream team” of American lawyers which defended O.J. is the following: since they could not deny that he was a violent man (photographic evidence and endless testimonies of O.J. beating his wife), they trotted out probability; according to statistics reports 4 million women were beaten by their husbands in 1994 in the US. Of all these women, how many were killed by their husbands? “Only” 1.600. “Only” means that, although 1.600 is an enormous number, it is low compared to 4 million; one every 2.500 women. Can you find O.J. guilty based on this probability? Of course not, it’s extremely low. This argument is brilliant and convinced everyone. Pity that it is wrong: when you calculate the probability of an event, you have to use all the information you have. The previously calculated probability ignores one piece of information: O.J. Simpson’s ex-wife was murdered by someone. How many of the 4 million women who were beaten by their husbands were also murdered by someone? 1.800. This is the right set of reference. Of these 1.800, how many were killed by that same husband who beat them? Again 1.600. But out of 1.800, not out of 4 million. There is a probability of almost 90% of O.J. being the murderer. An enormous mistake was made and nobody realized.
Why don’t we understand probability? We confuse probability with possibility. Take the State lottery phenomenon. We are constantly bombarded with winners’ stories. But the fact that someone has won doesn’t make it probable to win. It simply means that there has been many trials (i.e. that many people play every day). I want to leave you with an emblematic estimate: If I were the first Homo Habilis, the first Hominid on Earth and if I had played the lottery for 2.500.000 years, every Tuesdays, Thursdays and Saturdays, which would be my overall probability of winning at least once? More or less ½. Fascinating,because if I have a probability of winning of ½, I have exactly the same probability of not winning and of having to play all over again for another 2.500.000 years, every Tuesdays, Thursdays and Saturdays.