Paxata, a new data and analytics software provider says it wants to address one of the most pressing challenges facing today’s analyst performing analytics: simplifying data preparation. This trend toward simplification is well aligned with the market’s desire for improving usability, which our benchmark research into Next-Generation Business Intelligence shows is a primary buying consideration in two-thirds (64%) of companies. This trend is driving significant adoption of business-friendly-front-end visual and data discovery tools and is part of my research agenda for 2014.
On the back end, however, there is still considerable complexity. Non-traditional relational database systems such as Hadoop and big data appliances address the need to store and to some degree query massive amounts of structured and unstructured data. But the ability to efficiently and effectively blend these data sources and any third-party cloud-based data is still a challenge.
To address this challenge, the front end analytics tools that are being adopted by analysts and the multitude of back-end database systems must be integrated to deliver high quality analytic data sets. Today, this is no easy task. My latest benchmark research into Information Optimization recently released finds that when companies create and deploy information, the largest portions of time are spent on preparing data for analysis (49%) and reviewing data for quality and consistency issues (47%). In fact, our research shows that analysts consistently spend anywhere from 40 percent to 60 percent of their time in the data preparation phase that precedes actual analysis of the data.
Paxata and its Adaptive Data Preparation platform aims to solve the challenge of data preparation by improving the data aggregation, enrichment, quality and governance processes. It does this using a spreadsheet paradigm, a choice of approach that should resonate well with business analysts; our research into spreadsheet use in today’s enterprises finds that the majority of them (56%) are resistant to a move away from spreadsheets.
In Paxata’s design, once the data is loaded the software displays the combined dataset in a spreadsheet format and the user then manipulates the rows and columns to accomplish the various data preparation tasks. For instance, to profile the data, the analyst can first use a search box and an autocomplete query to find the data of interest and then use color-coded cells and visualization techniques to highlight patterns in the data. For data that may include multiple duplicate records such as addresses, the company includes services that help to sort through these records and make suggestions on what records to combine. This last task may be of particular interest for marketers attempting to combine multiple third-party data sources that list several addresses and names for the same individual.
Another key aspect of Paxata’s software is a history function that allows users to return to any step in the data preparation process and make changes on the fly. This ability to explore the lineage of the data enables another interesting function: “Paxata Share.” This collaborative capability enables multiple users to collaboratively evaluate the differences between data sets by looking at different assumptions that went into the processing of the data. This function is particularly interesting as it has the potential to solve the challenge of “battling boardroom facts” – the situation in which people come to a meeting with different versions of the truth based on the same data sources but different data preparation assumptions.
Under the covers, Paxata’s offering boasts a cloud-based multi-tenant architecture hosted on Rackspace and leveraging the OpenStack platform. The company says its product can comfortably handle big data, processing millions of rows (or about a terabyte) of data in real time. If data sets are larger than this, a batch process can replace the real-time analysis.
In my view, the main value of Paxata’s technology lies in the data analyst time it potentially can save. Much of the functionality it offers involves data discovery driven by the kinds of machine learning algorithms that my colleague Mark Smith discussed Four types of Discovery Technology. For instance, the Paxata software will recommend data and metric definitions based on the business context in which the analyst is working – a customer versus a supply chain context, for example – and these recommendations will sharpen as more data runs through the system.
Paxata is off to a great start, though the data connectors its product offers currently are limited; this will improve as it builds out connectors for more data sources. The company will also need to sort through a very noisy marketplace of companies that provide similar services, on-premises or in the cloud, and that all are adapting their messages to address the data preparation challenge. On its website, Paxata lists Cloudera, Qlik Technologies and Tableau as technology partners. The company also lists dozens of information enrichment partners including government organizations and data companies such as Acxiom, DataSift, and Esri. The list of information partners is extensive, which reflects a thoughtful focus on the value of third-party data sources.
Utilizing efficient cloud computing technology, Paxata is able to come out of the gate with aggressive pricing listed on the company site that is about $300 per month which is pretty small amount for the time that is saved on daily, weekly and monthly basis. Such pricing should help adoption especially with business analysts that the company targets. Organizations that are struggling with the time they put into the data preparation phase of analytics and those that are looking to leverage outside data sources in new and innovative ways should look into Paxata.