ParAccel Takes a Smart Path to Faster Analytics from Big Data


ParAccel is a well-funded big data startup, with $64 million invested in the firm so far. Only a few companies can top this level of startup funding, and most of them are service-based rather than product-based companies. Amazon has a 20 percent stake in the company and is making a big bet on the company’s technology to run its Redshift data warehouse in the cloud initiative. Microstrategy also uses ParAccel for it’s cloud offering, but holds no equity in the company.

ParAccel provides a software-based analytical platform that competes in the database appliance market, and as many in the space are increasingly trying to do, it is building analytic processes on top of the platform. On the base level, ParAccel is a massively parallel processing (MPP) database with columnar compression support, which allows for very fast query and analysis times. It is offered either as software or in an appliance configuration which, as we’ll discuss in a moment, is a different approach than many others in the space are taking. It connects with Teradata, Hadoop, Oracle and Microsoft SQL Server databases as well as financial market data such as semi-structured trading data and NYSE data through what the company calls On Demand Integration (ODI). This allows joint analysis through SQL of relational and non-relational data sources. In-database analytics offer more than 600 functions (though places on the company’s website and datasheets still say just over 500).

The company’s latest release, ParAccel 4.0, introduced product enhancements around performance as well as reliability and scalability. Performance enhancements include advanced query optimization that is said to improve aggregation performance 20X by doing “sort-aware” aggregations which tracks data properties up and down the processing pipeline. ParAccel’s own High Speed Interconnect protocol has been further optimized reducing data distribution overhead and speeding query processing. The new version 4.0 introduces new algorithms that exploit I/O patterns to pre-fetch data and store in memory, which again speeds query processing and reduced I/O overhead. The need for scalability is addressed in enhancements to enable the system to scale to 5,000 concurrent connections supporting up to 38,000 users on a single system. Its Hash Join algorithms allow for complex analytics by allowing the number of joins to fit the complexity of the analytic. Finally, interactive workload management introduces a class of persistent queries that allows short running queries and long running queries to be run side by side without impacting performance. This is particularly important as the integration of on-demand data sources through the company’s ODI approach could otherwise interfere with more interactive user requirements.

The company separates out its semi-annual database release cycle from the more iterative analytics release cycle. The new analytic functions just released just last month include a number of interesting developments for the company. Text analytics for various feeds allows for analytics across a variety of use cases, such as social media, customer comment analysis, insurance and warranty claims. In addition, functions such as sessionization and JSON parsing allow a new dimension of analytics for ParAccel as web data can now be analyzed. The new analytic capabilities allow the company to address a broad class of use cases such as “golden path analysis”, fraud detection, attribution modeling, segmentation and profiling. Interestingly, some of these use case are of the same character as those seen in the Hadoop world.

So where does ParAccel fit in the broader appliance landscape? vr_bigdata_big_data_technologies_plannedAccording to our benchmark research on big data more than 35 percent of businesses plan to use appliance technology, but the market is still fragmented. The appliance landscape can be broken down into categories that include hardware and software that run together, software that can be deployed across commodity hardware, and non-relational parallel processing paradigms such as Hadoop. This landscape gets especially interesting when we look at Amazon’s Redshift and the idea of elastic scalability on a relational data warehouse. The lack of elastic scalability in the data warehouse has been a big limitation for business; it has traditionally taken significant money, time and energy to implement.

With its “Right to Deploy” pricing strategy, ParAccel promises the same elasticity as with its on-premises deployments. The new pricing policy removes the traditional per-node pricing obstacles by offering prices based on “unlimited data” and takes into consideration the types of analytics that a company wants to deploy. This strategy may play well against companies that only sell their appliances bundled with hardware. Such vendors will have a difficult time matching ParAccel’s pricing because of their hardware-driven business model. While the offer is likely to get ParAccel invited into more consideration sets, it remains to be seen whether they win more deals based on it.

Partnerships with Amazon and MicroStrategy to provide cloud infrastructure produce a halo effect for ParAccel, but the cloud approaches compete against ParAccel’s internal sales efforts. One of the key differentiators for ParAccel as the company competes against the cloud version of itself will be the analytics that are stacked on top of the platform. Since neither Redshift nor MicroStrategy cloud offers currently license the upper parts of this value stack, customers and prospects will likely hear quite a bit about the library of 600-plus functions and the ability to address advanced analytics for clients. The extensible approach and the fact that the company has built analytics as a first class object in its database allow the architecture to address speed, scalability and analytic complexity. The one potential drawback, depending on how you look at it, is that the statistical libraries are based on user-defined-functions (UDFs) written in a procedural language. While the library integration is seamless to end users and scales well, if a company needs to customize the algorithms, data scientists must go into the underlying procedural programming language to make the changes. The upside is that the broad library of analytics can be used based on the SQL paradigm.

vr_bigdata_obstacles_to_big_data_analytics (2)While ParAccel aligns closely with the Hadoop ecosystem in order to source data, the company also seems to be welcoming opportunities to compete with Hadoop. Some of the use cases mentioned above such as so called “golden-path analysis, and others have been provided as key Hadoop analytic use cases. Furthermore, many Hadoop vendors are bringing the SQL access paradigm and traditional BI tools together with Hadoop to mitigate the skills gap in organizations. But if an MPP database like ParAccel that is built natively for relational data is also able to do big data analytics, and is able to deliver a more mature product with similar horizontal scalability and cost structure, the argument for standard SQL analytics on Hadoop becomes less compelling. If ParAccel is right, and SQL is the Lingua Franca for analytics, then they may be in a good position to fill the so called skills gap. Our benchmark research on business technology innovations shows that the biggest challenge for organizations deploying big data today revolves around staffing and training, with more than 77 percent of companies claiming that they are challenged in both categories.

ParAccel offers a unique approach in a crowded market. The new pricing policy is a brilliant stroke, as it not only will get the company invited into more bid opportunities, but it moves client conversations away from the technology-oriented three Vs and more to analytics and the business-oriented three Ws. If the company puts pricing pressure on the integrated appliance vendors, it will be interesting to see if any of those vendors begin to separate out their own software and allow it to run on commodity hardware. That would be a hard decision for them, since their underlying business models often rely on an integrated hardware/software strategy. With companies such as MicroStrategy and Amazon choosing it for their underlying analytical platforms, the company is one to watch. Depending on the use case and the organization, ParAccel’s in-database analytics should be readily considered and contrasted with other approaches.

Regards,

Ventana Research

Leave a comment