IBM made more than two dozen announcements in conjunction with its recent Information on Demand (IOD) event. In this post I’ll address the impact of IOD from an information management perspective and in a separate post shortly from an analytics perspective. Trying to organize the mass of information IBM brought forth at IOD 2011, I group the announcements into three general categories of enhancements and extensions to InfoSphere, big data (which is technically part of InfoSphere) and databases.
IBM announced version 8.7 of InfoSphere, its core data integration product, and version 10 of InfoSphere Master Data Management. InfoSphere 8.7 provides validation of data in transit, better integration with the Netezza data warehouse appliance and access to data in Hadoop as both a source and a target. MDM 10 consolidates three separate master data products from IBM: InfoSphere MDM Server, InfoSphere MDM Server for Product Information Management and Initiate Master Data Service. The combined capabilities will be bundled into four different editions, so an organization can choose to purchase only the portions it needs. The InfoSphere product line also includes Optim for information life-cycle management and governance. In this release Optim automatically refreshes test data and manages test data for SAP systems including detecting SAP system changes and masking data.
Big data was a prominent theme of the event. After the opening keynote IBM gave away a book, Understanding Big Data, that attendees eagerly scooped up. The book is equal parts introduction to the topic and a roadmap of how IBM’s products work with big data. Earlier this year IBM began to establish a presence in this market segment with its Big Data Symposium. In the interim the company has developed its product strategy more fully and now focuses its messaging and sales effort around two InfoSphere products: BigInsights 1.3, its Hadoop offering, and Streams 2.0 for real-time analysis of streaming data. Adaptive MapReduce, included in BigInsights 1.3, speeds up processing of small jobs by understanding which nodes in the cluster may have spare resources and transparently allocating additional work to those nodes. IBM promotes this feature as new in this release although it appears to have been included in version 1.2 but turned off, which is not an unusual way for software companies to get feedback on new features. IBM also integrates compression into BigInsights to help minimize storage costs for Hadoop data.
IBM is unusual in addressing the analysis of real-time streams of big data as well Hadoop-based batch analysis of big data. As I discussed in previous posts Hadoop does not provide real-time analysis, and in our recent Hadoop and Information Management benchmark research two-thirds of participants identified the lack of real-time capabilities as an obstacle – the most significant technical obstacle, ranked third overall behind inadequate staffing and training. IBM deals with this in Streams, which is an important part of the IBM big-data story; even though the current release number is 2.0 it emerged from work in the intelligence community going back 10 years. Streams can be used to analyze both structured and unstructured data. It processes data in a distributed fashion across a number of nodes, and in this release the previous limit of 125 nodes has been removed to enable more scalability. Other new features include a text analytics tool kit and a Hadoop connector for BigInsights. I would prefer to see Hadoop extended to deal with real-time data rather than managing it separately in Streams, but I recognize that is a big effort involving many technical challenges.
In its database products IBM announced IMS 12 for large online transaction processing applications. The new version has been clocked at processing 66,000 transactions per second, which is certainly fast, but it’s hard to evaluate the claim since it was not based on an industry-standard benchmark. For example, IBM and others have published faster results based on the TPC-C benchmark. It also announced the DB2 Analytics Accelerator, and integration of DB2 for z/OS, the high end of IBM’s relational database and data warehousing product line, with Netezza. The combination creates a more unified product family, bringing Netezza further into the fold, but the IBM product line is still more disjointed than the Teradata product line, even with the Aster Data products it recently acquired. In conjunction with IOD, IBM extended the Netezza product line with a high-capacity server to manage and query the ever-increasing volumes of archive data within organizations.
IBM’s messaging at IOD and elsewhere has been evolving toward a logical data warehouse concept consisting of multiple specialized and optimized data processing technologies. To support this concept, the company demonstrated technology from its labs that would coordinate the movement of data between the various systems including non-IBM databases, eliminating much of the ETL work that would normally accomplish this task. The approach sounds somewhat similar to Teradata’s recent announcement of Unity.
The IBM information management product line is both broad and deep. The company continues to invest in integrating the various elements it has created or acquired over the years. IBM InfoSphere continues to advance as I indicated in my analysis from last year and advances the information management revolution that I outlined earlier this year.
There’s still work to be done to create a more unified product family including the challenges with what my research found in the data in the cloud, but IBM offers a comprehensive set of capabilities that can meet the information management challenges of most organizations.