What is wrong with this picture? (Hint: it’s significant!)
This graphic from United States National Weather Service depicts forecasted rain amounts from Hurricane Harvey, which made landfall early Saturday near Rockport Texas. The problem with this data visualization is one that can afflict statistics in any form, not just graphics. At only 0.01 inches – about a quarter of a millimeter - the number of decimal places given for cities is considerably overstated. The problem underlying the problem is a lesson for us all: just because our statistical models can spit out a long string of digits doesn’t mean we should publish all of them.
We read in this graphic that Houston is expected to get 22.07 inches (560.6mm) of rain. This forecast is absurd! The quarter mm alleged accuracy will be violated from one end of a city house lot to the other, let alone a whole city.
Why all the fuss? Why complain about extra, meaningless decimal points? Three reasons:
1. Because they are meaningless. We should not be in the business of reporting anything we do not believe – including uncertainties! - or report what cannot be trusted, to the best of our ability to report it. We can be assured the good people who produced, reviewed, and published this graphic assuredly do not think the rainfall there will be 560.6 mm in all of large city such as Houston, Texas – fourth largest by population in the US and roughly comparable to Paris. Here is the key statement: if the we do not believe a number – especially the accuracy! – we should not publish it.
2. Because it misleads the audience. We should expect our audience, who does not know the details of our processes, to believe what we tell them about our own research. This forces us to review our work prior to publication with a critical eye toward what the reader will take away. This rule should apply to all channels – tables, graphics, written sentences, and any others. This is especially important for the accuracy, which sadly are often the most neglected when it comes it informing the public. In the example given, if the expected rainfall is, say, roughly 21 – 23 inches, then we should say so. We must resist the temptation to print more digits than the science justifies.
3. Because it’s bad science. Publishing this absurd, manifestly untrustworthy model result breeds justified distrust in our work, and unjustified distrust in the work of others. Over-stating the accuracy of scientific results may be the single most damaging thing we can to undermine the work of all scientists.
Just as every time a court of law hears a case is a public test of the judicial process, every result we publish is a public test of the scientific process. We hold the reputation of all of science in our hands every time we share our results – for better or for worse. All the power of science to do good, to save lives, and strengthen society, depends on our dedication to accurately state – and not over-state – our results.
David J Corliss, PhD
27 August, 2017