PentahoWorld, the first user conference for this 10-year-old supplier of data integration and business intelligence that provides business analytics, attracted more than 400 customers in roles ranging from IT and database professionals to business analysts and end users. The diversity of the crowd reflects Pentaho’s broad portfolio of products. It covers the integration aspects of big data analytics with the Pentaho Data Integration tools and the front-end tools and visualization with the Pentaho Business Analytics. In essence its portfolio provides end-to-end data to analytics through what they introduced as Big Data Orchestration that brings governed data delivery and streamlined data refinery together on one platform.
Pentaho has made progress in business over the past year, picking up Fortune 1000 clients and moving from providing analytics to midsize companies to serving more major companies such as Halliburton, Lufthansa and NASDAQ. One reason for this success is Pentaho’s ability to integrate large scale data from multiple sources including enterprise data warehouses, Hadoop and other NoSQL approaches. Our research into big data integration shows that Hadoop is a key technology that 44 percent of organizations are likely to use, but it is just one option in the enterprise data environment. A second key for Pentaho has been the embeddable nature of its approach, which enables companies, especially those selling cloud-based software as a service (SaaS), to use analytics to gain competitive advantage by placing its tools within their applications. For more detail on Pentaho’s analytics and business intelligence tools please my previous analytic perspective.
A key advance for the company over the past year has been the development and refinement of what the company calls big data blueprints. These are general use cases in such areas as ETL offloading and customer analytics. Each approach includes design patterns for ETL and analytics that work with high-performance analytic databases including NoSQL variants such as Mongo and Cassandra.
The blueprint concept is important for several reasons. First, it helps Pentaho focus on specific market needs. Second, it shows customers and partners processes that enable them to get immediate return on the technology investment. The same research referenced above shows that organizations manage their information and technology better than their people and processes; to realize full value from spending on new technology, they need to pay more attention to how the technology fits with these cultural aspects.
At the user conference, the company announced release 5.2 of its core business analytics products and featured its Governed Data Delivery concept and Streamlined Data Refinery. The Streamlined Data Refinery provides a process for business analysts to access the already integrated data provided through PDI and create data models on the fly. The advantage is that this is not a technical task and the business analyst does not have to understand the underlying metadata or the data structures. The user chooses the dimensions of the analysis using menus that offer multiple combinations to be chosen in an ad hoc manner. Then the Streamlined Data Refinery automatically generates a data cube that is available for fast querying of an analytic database. Currently, Pentaho supports only the HP Vertica database, but its roadmap promises to add high-performance databases from other suppliers. The entire process can take only a few minutes and provides a much more flexible and dynamic process than asking IT to rebuild a data model every time a new question is asked.
While Pentaho Data Integration enables users to bring together all available data and integrate it to find new insights, Streamlined Data Refinery gives business users direct access to the blended data. In this way they can explore data dynamically without involving IT. The other important aspect is that it easily provides the lineage of the data. Internal or external auditors often need to understand the nature of the data and the integration, which data lineage supports. Such a feature should benefit all types of businesses but especially those in regulated industries. This approach addresses the two top needs of business end users, which according to our benchmark research into information optimization, are to drill into data (for 37%) and search for specific information (36%).
Another advance is Pentaho 5.2’s support for Kerberos security on Cloudera, Hortonworks and MapR. Cloudera, currently the largest Hadoop distribution, and Hortonworks, which is planning to raise capital via a public offering, hold the lion’s share of the commercial Hadoop market. Kerberos puts a layer of authentication security between the Pentaho Data Integration tool and the Hadoop data. This helps address security concerns which have dramatically increased over the past year after major breaches at retailers, banks and government institutions.
These announcements show results of Pentaho’s enterprise-centric customer strategy as well as the company’s investment in senior leadership. Christopher Dziekan, the new chief product officer, presented a three-year roadmap that focuses on data access, governance and data integration. It is good to see the company put its stake in the ground with a well-formed vision of the big data market. Given the speed at which the market is changing and the necessity for Pentaho to consider the needs of its open source community, it will be interesting to see how the company adjusts the roadmap going forward.
For enterprises grappling with big data integration and trying to give business users access to new information sources, Pentaho’s Streamlined Data Refinery deserves a look. For both enterprises and ISVs that want to apply integration and analytics in context of another application, Pentaho’s REST-based APIs allow embedding of end-to-end analytic capabilities. Together with the big data blue prints discussed above, Pentaho is able to deliver a targeted yet flexible approach to big data.