(from Mark Twain; Lies, damned lies and statistics)
The film “Minority Report”, in which Big Data analytics is used to identify future murderers, tells the story of someone who has reason to doubt its infallibility. Although the purposes for which we use Big Data today don’t (yet!?) carry the same consequences, we should still be wary of putting all our trust in what it tells us.
The trouble is, Big Data analytics can be polluted by a number of factors:
- How the data is gathered, processed and summarised
- Human behaviour, being susceptible to different views and with proven non-rational thinking
- A company culture of acting fast and raising discouraging questions which might complicate the situation.
As much as we would like to think Big Data is the ultimate source of truth, there is considerable danger in resting with this assumption, because it may not be correct.
When making decisions based on Big Data, it is essential to have knowledge of Big Data Analytics and the factors which limit its reliability, combined with profound business experience, in order to evaluate results in the appropriate context.
Errors waiting to happen
There are myriad of reasons that can lead to us drawing incorrect conclusions from data.
- false or missing source data
- errors in the software that collects, filters and stores the data
- misinterpretations in data cleaning or correction steps (leaving out “outliers” that in themselves may be the key signal)
- errors in the data enrichment, where other attributes are connected to records
- errors in algorithms that help to interpret the data
False conclusions can also arise from a more hidden, but just as dangerous factor: the basic human tendencies that make us deviate from rational machine logic.
Confirmation bias is one of these tendencies. If you are convinced about something, your mind will subconsciously focus on information which confirms your thinking, while automatically filtering out that which contradicts. There is a real possibility that Big Data analysts might create algorithms that steer towards confirming their own opinions.
Similarly, the assumption that a result is “certain” is often invalid. Many (Big) data treatments apply some form of statistics, which in turn implies there’s a level of uncertainty involved. The human mind has proven difficulty in handling such uncertainty, and hence tends to ignore this aspect.
Computer says “No”
Let’s say Big Data analysis has revealed an important finding for your business. You want to act fast. But before you jump in with both feet, do you have anyone who is capable, willing and empowered to challenge and verify the presented results for bias, repeatability and reproducibility?
Data analytics steps are often not well documented. And the designers of these steps may not be around. Talking to the data scientist may be one challenge. Traceability of results based on Machine Learning algorithms is an even more difficult prospect.
Data vs humans
Despite the challenges that exist in using Big Data analytics to support decision making, I am a big fan of good data use. Because without the support of data, the likelihood of human error causing decision-making flaws may be even bigger! Human error is a known big contributor to industry and transportation risk, with too many accidents as proof. And without data, what do we have to enable us to challenge “non-data-based” human decision-making for bias, repeatability & reproducibility?
Detecting incorrect conclusions from Big Data analytics is a complex matter which requires balancing a critical view with a practical approach. Something that takes thorough knowledge of and expertise in both data analytics and business operations. These are all key skills here at R&G. So if you’d like a sanity check on your data-driven decision making, be sure to give us a shout!
Vincent Gerdes is Senior Business Process Consultant at R&G Global Consultants in The Netherlands