IBM Makes Big Data Deal for Vivisimo and Supports Cloudera Hadoop


Through a series of acquisitions and organic development over the last five years, IBM has established itself as a leader in enterprise big data for business analytics. I recently wrote about IBM Smarter Analytics, which brings together the company’s portfolio of software, systems and services from analytics to big data. But supporting big data requires the ability to access many sources of information; our benchmark research on big data found that more than half of organizations require information from external sources, and that requires some software flexibility.

To help meet that challenge, IBM has announced that it will acquire Vivisimo, a company that has been operating for more than 10 years and has over 100 employees and publicly listed customer references including Airbus, LexisNexis, P&G, Schering-Plough and USGS. The majority of its customers use its enterprise search platform Vivisimo Velocity Platform to index and access structured and unstructured big data.

The Vivisimo acquisition is important for IBM because it has not had a substantive search technology for its information management portfolio that can operate across all types of information. Competitively this is critical; the demand for better search has led Oracle to expand its efforts with Oracle Secure Enterprise Search and to acquire Endeca for more depth on structured data. Our benchmark research in business analytics found that searching for specific answers is the top-requested capability in 83 percent of organizations. This finding should highlight the importance of Vivisimo for IBM’s efforts in business analytics, though the software will be integrated by IBM’s information management software division, which focuses more on IT than business.

IBM is positioning Vivisimo as part of its efforts to provide the ability to explore big data stored in internal, external and Internet-based sources. That position makes sense based on IBM’s customers’ needs, but Vivisimo described its product as information optimization for data access and discovery. IBM and Vivisimo discussed its efforts in search-based applications, which is an industry analyst term for software designed to help organizations access and assemble information. But search is just one of many capabilities that combine in the broader context of information applications is something IBM now can add to its portfolio. Our benchmark research on information applications finds that searching for specific answers is the top-ranked requirement for 46 percent of organizations.

Vivisimo provides analytics on text and metadata that can be accessed through its search capabilities. Its focus on scalable but secure search is part of why it became one of the leaders in enterprise search at the high end of the enterprise software market. The software’s security features are critical, as organizations do not want to make it faster for unauthorized users to access information; more than three-fourths (79%) are concerned about data privacy or security breaches, according to our big data benchmark research. Also key is the platform’s flexibility at integrating sources across the enterprise, which supports  our research finding that the range of information available must span customer, transactional data, logs, call detail records and plenty of other sources. In addition, Vivisimo supports mobile technologies such as smartphones to make it simpler to get to and access information from any platform.

Along with the Vivisimo acquisition, IBM announced support for Cloudera Hadoop as part of InfoSphere BigInsights.  Cloudera continues to grow with its enhanced commercial version of Hadoop. IBM’s expanded support of Hadoop is critical, as our Hadoop and Information Management benchmark research found Hadoop being used in 22 percent of organizations engaged in big data, with another 32 percent evaluating or planning to adopt it. IBM says it is the only provider to support multiple Hadoop distributions today, but actually other companies do, too; Datameer does for analytics, as I just assessed, and Pentaho does for data integration. Also in data integration, Informatica supports Cloudera and other Hadoop technologies.

Acquiring Vivisimo is a smart move for IBM to expand its array of tools for big data and its overall information management portfolio. Considering the lack of enterprise depth from the likes of Google Enterprise Search, IBM’s opportunity is large. Expanding the platform to support other Hadoop distributions, such as Cloudera, is critical, as organizations are thinking about what distribution to place into production, and don’t just automatically use what is freely available from Apache. Having an open framework for its big-data platform is a smart move for IBM, as it lets customers embrace and extend a range of software. By comparison, others, such as Oracle, are focused primarily on supporting their own stacks of technologies. It is a stark difference. If you are considering adding search or Hadoop as part of your big-data efforts, take a look at what IBM is doing.

Regards,

Mark Smith – CEO & Chief Research Officer

Datameer Advances Big-Data Analytics on Hadoop


The increasing pressure to store, retrieve and process data on an unprecedented scale in the enterprise has created a market for processes and tools to support it. Big data, as it’s widely known, is one of the six business technology innovations of the decade outlined in our research agenda, and it has created a renaissance in data management. Our benchmark research on big data finds the top benefits of it to be the ability to retain and analyze more data (74%) and to increase the speed of analysis (70%). In this context a vendor named Datameer comes in.

We have been tracking the company since its inception and first product announcement in 2010. Since then, Datameer has been steadily improving its tools for analyzing big data stored in the open source technology Hadoop. Typically, as organizations adopt and deploy Hadoop, they need tools that help analysts make sense of the mass of underlying data. The issue of analyzing data efficiently in Hadoop has been a growing concern among organizations that establish clusters of Hadoop instances. They need to apply analytics and visualize data for decision support; analyzing data is important to 88 percent of organizations that are using Hadoop, according to our research, and they have plenty of it work on. Organizations that use Hadoop are twice as likely (48%) as those that do not (23%) to produce more than 100 gigabytes a day.

Datameer has an interesting wizard-based approach to bringing structured and unstructured data together and the ability to schedule processing of it. Its automation capability addresses one of the key benefits of big data, namely reducing or eliminating manual processes, which is critical to more than half (59%) of organizations. Datameer provides a familiar spreadsheet-type approach and an analytics library of more than 200 functions.  It also has a drag–and-drop visual approach to analytics, with reports that can be assembled into presentations or delivered in other ways. As well it has many of the capabilities you would see in a business intelligence tool, including query, analysis, reporting and the ability to assemble dashboards, and it helps users discover trends in the data through its visualization. The tool is simple to use, and usability is an evaluation criterion of top importance in 78 percent of organizations using Hadoop. Reliability (63%) is next, which is understandable as Hadoop users analyze data more frequently than others – hourly in 27 percent of organizations and at least daily in 77 percent. A big-data approach enables organizations to analyze large volumes of data at a fine level of detail, which is something 88 percent of organizations said they need.

In Datameer 1.4, the most recent version, the company has added critical functionality. The software can partition data into time-based sets to help with time-series analysis and quickly determine trends from period to period or from a specific date forward. It also supports multiway joins that can be linked within one interface, eliminating the need to create extra sheets within the application. Datameer 1.4 expands the precision for big integer and decimal types, and provides an SDK to offer further flexibility in using input adapters that can get to data from PostgreSQL and Greenplum databases. It has also expanded its REST API, which is used for integration with other applications and scripting. To ensure that data remains secure across network transport layers, Datameer has added Secure LDAP over SSL for authentication and embraces HDFS permissions down to the end-user level.

At last fall’s Hadoop World, Datameer was flaunting its support across the Hadoop ecosystems. The product is available on major cloud computing platforms including Amazon Web Services, EMC, Cloudera, Hortonworks, IBM BigInsights and MapR. Datameer recently announced availability on Microsoft Windows Azure, and if you want to hear why that partnership is relevant, you can listen to this video. Last summer I wrote about the faceoff of Hadoop and Oracle that now has Oracle embracing and integrating to Hadoop instead of ignoring it which will lead to more dialogue on Datameer. Recently Datameer announced a partnership with DataSift, which provides social media analytics against large volumes of data from Twitter.

In these ways Datameer is helping advance the state of analytics on Hadoop. By simplifying the process it is addressing the largest challenges our research found in organizations using big data, which are staffing (80%) and training (74%). Datameer resembles the BI vendors in the 1990s that began attaching to data marts and letting businesses analyze their own data and found significant growth. If Datameer can continue to invest and expand its partner ecosystem with Hadoop vendors and other software providers and consulting firms, it will be able to grow rapidly. If you are working with Hadoop and have not tried Datameer, I recommend that you evaluate it.

Regards,

Mark Smith – CEO & Chief Research Officer