Are You Analyzing the Right Data?
A couple of years ago, I ran across a blog post that observed, “Big Data does not necessarily mean Good Data. If data is incomplete, out of context or otherwise contaminated, it can lead to decisions that could undermine the competitiveness of an enterprise…”This seemed obvious to me at the time. Yet here we are two years later and much of the discussion about Big Data and BI analytics still focuses on analytical tools, not on the right data, in context.
"Only 34 percent used automated data profiling tools—and these were early-generation tools that required trained technicians and relied on human-created metadata"
This is not a trivial distinction when you consider that that are several types of analytics—at least five. You’ve got descriptive analytics—what happened—or diagnostic analytics—why it happened. Then you have predictive analytics—what will happen, prescriptive analytics—how can we make it happen, and pre-emptive analytics, which involves modeling ‘what if’ scenarios. Each of these require a certain type of information and context. Without that, you can get into some serious trouble. In other words, you can go from being wrong to being extremely wrong.
If you want some examples of how far awry analytics can go when the wrong information is analyzed—or analyzed out of context—just pick up a copy of Nate Silver’s book The Signal and the Noise.For example, among many other factors, the rating agencies that gave AAA ratings to mortgage-backed securities, which eventually brought down the financial system, weren’t looking at the right information. Whoops.
Beyond the Familiar
If left to their own devices, most data and business analysts work with the data sets they know about. Among organizations that want to compete on analytics, there’s a lot of pressure to reduce time to insight. Getting in the IT queue to prospect for new data sources means delays. Typically, enlisting IT means an iterative process with the user that eventually produces the right mix of information.
Of course, within most organizations there are SMEs who know certain ‘data neighborhoods’ very well. Slightly more efficient but still not ideal. A recent Forrester study commissioned by Attivio of 50 US-based IT and business decision-makers at large firms of 1,000 or more employees found that 58 percent relied on subject matter experts while 48 percent review source system documentation. Only 34 percent used automated data profiling tools—and these were early-generation tools that required trained technicians and relied on human-created metadata.
The Data Supply Chain Bottleneck
What all this adds up to is a bottleneck in the data supply chain (DSC)—slowing the process that collects, stores, analyzes, and transforms data into insights. It’s worth remembering that according Forrester data source discovery can consume well over half of an entire analytics project.
That’s a substantial ‘speed bump,’ which can extend data gathering from days to weeks to months. Forrester’s Boris Evelson notes that flattening that bump requires an automated process supported by “data discovery accelerators to help profile and discover definitions and meanings in data sources, while ingesting all types of data from anywhere in or beyond your organization.
So instead of months, data gathering can be done in minutes. The virtue of speed is obvious; it enables analysts to spend more time analyzing and less time gathering. But just as critical is finding and analyzing the right data. After all, as much as 90 percent of information stored by organizations today remains unknown and untouched.
In fact, I would argue that as connected devices and the Internet of Things (IoT) send us ever-larger volumes of data, the importance of analyzing the right information exceeds even the value of speed. After all, analyzing the wrong or incomplete data faster isn’t really worthwhile.
Self-Service Data Source Discovery
Data-driven organizations have learned—often the hard way— that only having fast access to all the relevant data can turn an ambitious Big Data vision into reality. So these organizations invest in automated, self-service data source discovery tools, which:
• Reduce the time spent on gathering data
• Leverage the untapped potential in large volumes of hidden information
• Enable business users and data analysts to find and refine data sets on their own
• Accelerate time to insight
Big Data velocity and variety put a premium on analyzing the right data. But, leaving data source discovery to a combination of IT and subject matter experts puts organizations at increasing risk of making costly mistakes—and sacrificing greater revenue, profitability, and operational efficiency.