Kognitio Pablo Brings MDX to Data and Analytics


Kognitio announced the addition of MultiDimensional eXpressions (MDX) capabilities for its WX2 product line. John Thompson, CEO of U.S. operations, and Sean Jackson, VP of marketing, shared some of the details with me recently. I find the marriage of MDX and large-scale data both technically challenging and potentially valuable to the market.

Kognitio develops and markets a massively parallel processing (MPP) database system targeted at the business intelligence and data warehousing market for use as an analytical database. WX2 is available as on-premises software, an appliance or software as a service (SaaS). The product has a relatively long history, having been developed in the early 1990s as part of Whitecross Systems, which merged with Kognitio in 2005. Headquartered in the U.K. the company is making a push for a bigger presence in the U.S. using funds raised for the purpose in December.

MDX provides more powerful ways than SQL to express relationships between data elements that are organized into cubes for online analytical processing (OLAP) analysis. Although a number of vendors have implemented OLAP capabilities on top of relational databases (referred to as ROLAP), it’s difficult to create meaningful sets of derived data using this approach. MDX makes it easier to express formulaic relationships between different data elements, thus enabling organizations to create relatively sophisticated historical or prospective measures to assess or project the performance of their business. To illustrate the difference, with SQL you could project sales as a percentage of last year’s sales, but with MDX you could project sales based on last year’s sales along with this year’s advertising programs and hiring plans and include a projected ramp-up period for each sales person. Using MDX it also is much easier to create what-if and planning types of analyses. Our benchmark research shows that only 22% of organizations currently can conduct what-if analysis for planning and forecasting, but 84% said it is important or very important to add these capabilities for decision-making and performance management.

Let me note that providing MDX or other what-if analyses over large amounts of data presents some significant technical challenges; these are mostly based around performance issues, although the amount of memory required for the analyses can be another issue. Because MDX makes it easier to express complex formulas, any calculation could reference any other data point in the entire data set. The challenge lies in working through the calculation dependencies and getting the necessary data into memory quickly to perform the calculations and deliver the results. As I pointed out in the MPP blog post referenced above, it becomes even more challenging when the data can reside on another node in the MPP system.

Kognitio will introduce one of the first products to combine MPP and MDX capabilities. SAP also provides MDX capabilities for its high-performance analytic appliance (SAP HANA), which my colleague commented on As more and more data gets stored and analyzed using MPP systems, the MDX capabilities can help organizations produce more accurate analyses of where their business is headed and thus make better business decisions. The alternative today is to operate on subsets of data, which can potentially reduce the accuracy of the results or can lead to more complicated systems that attempt to combine results from multiple separate analyses. The MDX capabilities of Kognitio’s WX2, referred to as Pablo, are not a separate product but are built into it. The company touts these features as helping with the process of producing and managing analytic cubes. While these capabilities will help with that process by maintaining the cubes directly within WX2, I see as much potential value in being able to do more meaningful analyses over larger amounts of data.

These new MDX capabilities are scheduled for availability in June. Initially at least they come with some caveats. If you are evaluating Kognitio, be aware that the initial target client tool is Excel pivot tables. You’ll have to wait to use another BI front-end tool. I also advise you to explore what subset of MDX is supported to make sure the expressions that are critical to your business models can be included in your models. Also, Kognitio provides no inherent mechanism to process updates for purposes of what-if analysis, but the company claims you can update the underlying relational data using other mechanisms and the cubes will be recalculated automatically. Finally the biggest caveat will be performance. In addition to assessing the overall performance of the new system vs. existing OLAP systems, you will want to see how much memory is required for the cubes and what happens when the system exceeds the available amount of physical memory. The litmus test will be to define a cube that is larger than the memory on one of the nodes and see whether it continues to perform adequately.

Regardless of whether Kognitio gets it right in the first release, it’s encouraging to see vendors advancing the types of analytics that can be performed on large data. I expect others to follow suit, and that will be good for business users who need to perform planning and what-if analysis on ever larger amounts of data.

Let me know your thoughts or come and collaborate with me on  Facebook, LinkedIn and  Twitter .

Regards,

Ventana Research

4 thoughts on “Kognitio Pablo Brings MDX to Data and Analytics

  1. Kognitio and the other large database vendors have learned what many analysts learned years ago. Regardless of the engine, OLAP and MDX language capabilities are still the best mechanisms to perform complex query analysis. As David points out, the current perception is that the volume of data will inhibit the performance of the typical OLAP cube, but may not impact a ROLAP model built with the new MDX paradigm. While Kognition prefers not to have ROLAP used as a descriptor to their implementation, it is in fact a variant of ROLAP that is more scalable and performant. Teadata and SAP HANA have similar variants of the ROLAP model – each implemented to take to advantage of their key performance techniques. See more comments on this @ http://blogs.simba.com/

  2. David, thanks for your insightful evaluation of our Pablo product. You make some great points, particularly about how expanding the range of possible analyses is at least as important as not having to build cubes. I also wanted to address some of the caveats that you mentioned in your post:

    The first task when creating an MDX provider is to ensure that it works well with Microsoft Excel, since it is the predominant interface. However, Excel won’t be the only interface Pablo supports. We are working with customers to determine which other tools have strong demand and will certainly add support for them.

    Kognitio WX2 and Pablo are architected such that the virtual cubes can be altered and re-built ‘on the fly’ with significant changes – such as dimensional restructuring – within seconds or minutes rather than hours. We demonstrated that capability live at the recent Gartner BI Summit in London. We ran an interactive analysis of a 580m row fact table natively via pivot tables in Excel – with 5 dimensions – and proceeded to change the dimensional structure, republish the cube and recommence analysis within Excel all under 2 minutes.

    Pablo requires ‘in-memory’ processing. As such, the system must scale to ensure the data set can be accommodated in memory. We recommend that sufficient memory is left available to allow processing. This is variable according to the application but will increase with concurrency.

    If users attempt to create a view image (i.e. virtual cube) that is greater than the available memory, an out-of-memory error is raised and the process terminated. When the image is created in memory, query processing is possible without error even if less than ideal memory is available. WX2 will manage the memory by streaming the queries, using sophisticated algorithms to optimize use of the remaining memory. What will not happen is physical disk writes. Without a doubt, in situations where there is significant streaming the cube’s performance will decrease. With completely in-memory objects, such as Pablo cubes, this is a linear degradation. However, this would be true of any database where the sizing of the environment was inappropriate for the application.

    I hope this sheds light on some of the caveats you highlighted. Again, we appreciate the blog post and your interest in the Pablo product.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s