Recently Karmasphere introduced version 1.5 of its Analyst product which helps organizations analyze “big data” stored in Hadoop, the open source large-scale data processing technology. An independent software vendor focused exclusively on the Hadoop market, Karmasphere made available a community edition of its developer product in September 2009 and launched the company in March 2010. Since then it has been active and visible in Hadoop-related events including Hadoop World, the IBM Big Data Symposium and others.
Fundamentally, Karmasphere focuses on making Hadoop easier to use and more accessible for both developers and analysts, who need help in this area. Our recent benchmark research on Hadoop and Information Management shows a significant shortage of skills: Hadoop users cited staffing and training as the two most significant obstacles in analyzing large scale data sets, impacting 80% and 74% of organizations, respectively.
Karmasphere Analyst 1.5 provides an interactive, graphical environment for analyzing data in Hadoop. To begin the process, it helps users understand the data structures available in Hadoop by presenting a table-based view of existing data and the ability to create new tables. In addition, Karmasphere Analyst combines information from multiple Hadoop data stores to present a unified view. Users assemble queries with a SQL-based development environment that includes syntax checking and prompts to help in the process. More than 100 user-defined functions (UDFs) are included for many common tasks and analyses. Once assembled, these queries can be stored, reused and combined together into a “query chain” or workflow involving multiple steps that are often necessary in the data preparation and analysis process. Karmasphere Analyst provides visual query plans and explanations that make it easier to understand and modify the queries. Users also can visualize the results of queries in graphical or tabular displays.
Later on in the process Karmasphere helps users prepare and move jobs into production. It includes embedded Hive and Hadoop capabilities for desktop prototyping so users can test and debug on their desktops. Then they can package and export the jobs for deployment to a cluster. Karmasphere also provides capabilities for monitoring jobs and optimizing job performance. It works with a variety of Hadoop sources including Amazon Elastic MapReduce, Apache, Cloudera, EMC Greenplum, IBM and MapR <http://www.mapr.com>. Given the proliferation of sources for Hadoop, including the recently formed Hortonworks with its focus on Apache Hadoop, the ability to work with multiple version could be valuable to organizations in the evaluation process and to those who have chosen to work with multiple versions, which is the case with nearly half the participants in our benchmark research cited above.
Karmasphere has carved out a niche in the big-data market where there are unmet needs. However, it will face competition from bigger vendors as they incorporate features into their business intelligence and information management platforms that make it easier to work with Hadoop. One way Karmasphere could maintain a unique position would be to broaden its capabilities for advanced analytics. Our research shows that 69% of organizations working with Hadoop use it for advanced analytics including data mining and predictive analytics. Another way Karmasphere could improve its position with respect to larger vendors would be to provide better integration with tools beyond Excel and Tableau, which it offers today.
In the meantime, if you work with Hadoop and are looking for ways to be more productive or empower a broader range of analysts, you can try some of Karmasphere’s features for yourself here.