Big data analytics is being offered as the key to addressing a wide array of management and operational needs across business and IT. But the label “big data analytics” is used in a variety of ways, confusing people about its usefulness and value and about how best to implement to drive business value. The uncertainty this causes poses a challenge for organizations that want to take advantage of big data in order to gain competitive advantage, comply with regulations, manage risk and improve profitability. Should organizations invest further into visual or deep data discovery on big data, or delve more deeply into statistics and predictive analytics, or find new ways to integrate big data into current operational systems?
Recently, I discussed a high-level framework for thinking about big data analytics that aligns with former Census Director Robert Groves’ ideas of designed data on the one hand and organic data on the other. This second article completes that picture by looking at four specific areas that constitute the practical aspects of big data analytics – topics that must be brought into any holistic discussion of big data analytics strategy. Today, these often represent point-oriented approaches, but architectures are now coming to market that promise more unified solutions.
The intersection of big data analytics and traditional approaches to analytics: Analytics performed by data miners and database professionals often differ significantly from analytics delivered by line-of-business staffers who work in more flat-file-oriented environments. Today, advancements in in-memory systems, in-database analytics and workload-specific appliances provide scalable architectures that bring processing to the data source and allow organizations to push analytics out to a broader audience, but how to bridge the divide between the two kinds of analytics is still a key question. Given the relative immaturity of new technologies and the dominance of relational databases for information delivery, it is critical to examine how all analytical assets will interact with core database systems. As we move to operationalizing analytics on an industrial scale, the current advanced analytical approaches break down because it requires pulling data into a separate analytic environment and does not leverage advances in parallel computing. Furthermore, organizations need to determine how they can apply existing skill sets and analytical access paradigms such as business intelligence tools, SQL, spreadsheets and visual analysis, to big data analytics. Our recent big data benchmark research shows that the skills gap is the biggest issue facing analytics initiatives with staffing and training as an obstacle in over three quarters of organizations.
Visual analytics and data discovery: Visualizing data is a hot topic, especially in big data analytics. Much of big data analysis is about finding patterns in data and visualizing them so that people can tell a story and give context to large and diverse sets of data. Exploratory analytics allows us to develop and investigate hypotheses, reduce data, do root-cause analysis and suggest modeling approaches for our predictive analytics. Until now the focus of these tools has been on descriptive statistics related to SQL or flat file environments, but now visual analytics vendors are bringing predictive capabilities into the market to drive usability, especially at the business user level. This is a difficult challenge because the inherent simplicity of these descriptive visual tools clashes with the inherent complexity that defines predictive analytics. In addition, companies are looking to apply visualization to the output of predictive models as well. Visual discovery players are opening up their APIs in order to directly export predictive model output.
New tools and techniques in visualization along with the proliferation of in-memory systems allow companies the means of sorting through and making sense of big data, but exactly how these tools work, the types of visualizations that are important to big data analytics and how they integrate into our current big data analytics architecture are still key questions, as is the issue of how search-based data discovery approaches fit into the architectural landscape.
Predictive analytics: Visual exploration of data cannot surface all patterns, especially the most complex ones. To make sense of enormous data sets, data mining and statistical techniques can find patterns, relationships and anomalies in the data and use them to predict future outcomes for individual cases. Companies need to investigate the use of advanced analytic approaches and algorithmic methods that can transform and analyze organic data for uses such as predicting security threats, uncovering fraud or targeting product offers to particular customers.
Commodity models (a.k.a. good-enough models) are allowing business users to drive the modeling process. How these models can be built and consumed at the front line of the organization with only basic oversight by a statistician data scientist is a key area of focus as organizations endeavor to bring analytics into the fabric of the organization. The increased load on the back end systems is another key consideration if the modeling is a dynamic software driven approach. How these models are managed and tracked is yet another consideration. Our research on predictive analytics shows that companies that update their models more frequently have much higher satisfaction ratings than those that update on a less frequent basis. The research further shows that in over half of organizations that competitive advantage and revenue growth are the primary reasons that predictive analytics are deployed, and therefore such analytics are hard to ignore when organizations think about big data analytics.
Right-time and real-time analytics: It’s important to investigate the intersection of big data analytics with right-time and real-time systems and learn how participants are using big data analytics in production on an industrial scale. This usage guides the decisions that we make today around how to begin the task of big data analytics. Another choice organizations must make is whether to capture and store all of their data and analyze it on the back end, attempt to process it on the fly, or do both. In this context, event processing and decision management technologies represent a big part of big data analytics since they can help examine data streams for value and deliver information to the front lines of the organization immediately. How traditionally batch-oriented big data technologies such as Hadoop fit into the broader picture of right-time consumption still needs to be answered as well. Ultimately, as happens with many aspects of big data analytics, the discussion will need to center on the use case and how to address the time to value (TTV) equation.
Organizations embarking on a big data strategy must not fail to consider the four areas above. Furthermore, their discussions cannot cover just the technological approaches, but must include people, processes and the entire information landscape. Often, this endeavor requires a fundamental rethinking of organizational processes and questioning of the status quo. Only then can companies see the forest for the trees.