I highly recommend Douglas Hubbard's excellent work, How to Measure Anything: Finding the Value of Intangibles in Business, for those interested in exploring ways to demonstrate the value of applying social media in health care. It's not exactly an easy read, and yes, there is some math involved. But here's a concept from the book that I found fascinating, and was able to demonstrate during our Social Media Residency course in Salt Lake City earlier this year.
The concept is called The Rule of 5 and it illustrates the value of even an extremely small sample to give a significantly improved confidence interval for a value.Based on what they have read about polling in the news, most people think a sample of 100, 200 or even 300 is needed to be "statistically significant." Hubbard counters that when you don't have much knowledge about the true value, even a sample of five elements, randomly drawn, can give you much more confidence in a much narrower confidence interval.
We proved it in Salt Lake City. Here's how.
We sent a survey to all of the 90+ participants in the course at the beginning of the morning, asking them to submit three values anonymously:
- Their own weight, in pounds.
- The lower and upper bounds of a range they thought would be 90 percent certain to include the average weight for everyone in the course.
Then we asked the class to shout out random numbers from 1-90, and from those numbers we selected the self-reported weight corresponding to the response number from the spreadsheet that had all of the submissions.
"The Rule of 5"
The Rule of 5 as developed by Hubbard holds that a randomly drawn sample of 5 elements will have a 93.75% chance of containing the true median of a population. In other words, there is a 93.75% chance that the population median is between the smallest and largest values in the sample. With a sample size of 11, you get a 93.5% confidence interval that is likely much narrower, by using the third-smallest and third-largest values as your bounds. It's a concept Hubbard calls "Mathless CIs" (Confidence Intervals.)
And if the population is normally distributed, these also gives you an estimate of the mean, or average value, for the population.
Putting "The Rule of 5" to the Test
Here's what we found in Salt Lake City:
- The true value for the population mean was 167 pounds.
- Our sample of 5 gave us a confidence interval of 155-250, or 95 pounds.
- Our sample of 11 gave us a confidence interval of 135-200, or 65 pounds. Just 6 more observations, but a much narrower range.
- Both ranges did, in fact, include the mean.
- Even these small samples produced better ranges than most of the human estimators. The sample of 5 produced a better range (narrower and more accurate) than 49 of the 87 estimates. With a sample of 11, the range was better than 63 of the humans.
Here is a slide deck illustrating the exercise:
What does it all mean?
Of course, in the real world you almost never know what the real value of the mean or median of a population is. The point of the exercise is to show that a surprisingly small sample can give you a better estimate of the value than human estimates alone.
As Hubbard says, measurement only matters in the context of a decision, to help you reduce your chance of being wrong. Sometimes a small sample like this, and a Mathless CI, can give you the information you need for a better decision.