Spark Summit Shows Momentum in Adoption of Apache Spark

Last week I attended Spark Summit East 2016 at the New York Hilton Midtown. It revealed several ways in which Spark technology might impact the big data market.

Apache Spark is an open source data processing engine designed for large-scale computing. Spark is often used in conjunction with the open source Apache Hadoop, but it can be used with other data sources as well such as Cassandra, MongoDB and Amazon S3. The creators of Spark founded Databricks, which drives the roadmap for Spark and leads community evangelism including organizing the Spark Summit events. According to Arsalan Tavakoli-Shiraji, VP of customer engagement and business development for Databricks, the company contributes approximately 75 percent of the code to the Spark project.

If you are wondering why Spark matters, consider that in June 2015, IBM announced that it would put more than 3,500 researchers and developers on Spark-related projects and would donate its IBM SystemML machine-learning technology to the Spark open source ecosystem. This scale of investment is likely to attract others in the industry, and interest in Spark is growing elsewhere, too. In his keynote presentation Matei Zaharia, co-founder and CTO of Databricks, claimed that summit attendees (across multiple locations) grew from 1,100 in 2014 to 3,900 in 2015. I was told by event staff that the New York event alone, including preconference training sessions, exhibitors and sponsors, had approximately 1,500 attendees, doubling attendance from the previous year.

This is where it starts to feel like, in the words of Yogi Berra, déjà vu all over again. In 2010 and 2011 I attended and wrote about events held in the very same venue. At the time, interest was growing in something called “Hadoop,” which, according to our recent benchmark research on big data analytics, has now grown to the point where it is used by 37 percent of organizations to process their big data analytics.

Recently there has been a lot of buzz around Spark, including speculation that Spark could replace Hadoop. Just search for “Spark replace Hadoop” and see how many pages come up. However, this is off target: Spark won’t replace Hadoop, in particular because it is not designed to store data. Rather it is designed to be used in conjunction with data storage tools such as the Hadoop Distributed File System (HDFS). Spark does, however, address a vr_Big_Data_Analytics_19_important_areas_of_big_data_analyticsshortcoming in one part of Hadoop: MapReduce. MapReduce was designed to process very large amounts of data in parallel to speed up processing, but it was designed for batch processing. As big data has grown in popularity, more and more users are accessing it and demanding interactive access to data regardless of its volume. More than half (54%) of participants in our research rated right-time and real-time analytics as important, and nearly half (48%) rated visual and data discovery important. Spark’s fundamental performance advantage over MapReduce lies in being designed to process data in memory to the extent possible.

Spark can be used to provide real-time analytic capabilities on big data. Spark has several components that provide those capabilities. Spark Streaming provides real-time capabilities for processing streaming data, that is, data being generated constantly. Spark SQL provides interactive query capabilities on structured data. And Spark ML provides machine-learning capabilities that can be used to build analytical models including predictive analytics. Our research shows that predictive analytics and statistics are the most important area of big data analytics, cited by more than three-fourths (78%) of participants.

At the conference Databricks announced Spark 2.0, planned for release in April or May. It will have three primary sets of enhancements. What is presently called Tungsten phase 2 improves performance of Spark with native memory management and better run-time code generation. Structured streaming combines real-time processing with batch and interactive queries. And the merging of APIs for DataFrame and DataSets will enable a single programming paradigm to be used for streaming and structured data.

There is a rich ecosystem developing around Spark. A variety of vendors attended the show including several of the Hadoop distribution vendors (Cloudera, Hortonworks, IBM and MapR) demonstrating and talking about how they support Spark. Analytic vendors including Alpine Data, Platfora, SAP, Stratio and Zoomdata demonstrated products that use Spark to provide interactive analytics over large amounts of data. Full-day training sessions that were part of the agenda were oversubscribed, with packed rooms and registration spots sold out on the event’s website.

The event was clearly focused on a technical audience, and a look through conference keynotes reveals many code snippets. However, this technical focus does not suggest a lack of business purpose in Spark applications. Use cases were also represented in the conference agenda, and future events may attract more business users. The popularity of the event and the acceptance of Spark by the Hadoop community and vendors suggest that Spark is here to stay – at least for the foreseeable future. If you are working with or plan to work with big data, it would be worthwhile to understand how Spark can be incorporated into your architecture, either directly as part your own programming efforts or via third-party tools to use in conjunction with big data.


David Menninger

SVP & Research Director

TelStrat Achieves Success with Workforce Optimization in the Cloud

TelStrat is a company with a long history. Founded in 1993 it initially resold products of Nortel, Cisco and other telecom equipment vendors. The first product it developed and brought to market was a call recording system deployed on the customer’s premises.  It expanded its portfolio over the years, and today its product suite Engage offers all the key pieces of workforce optimization: call recording, desktop capture, quality management, workforce management and speech, text and desktop analytics. TelStrat built this portfolio through a combination of in-house development and partnering with other vendors. It has achieved considerable business success, having more than 3,300 installations in 55 countries, most of which are delivered through a global ecosystem of some 330 channel partners. Engage is available in three models: Unity is an on-premises, single-server version that supports up to 250 users; Enterprise is an on-premises, multiple-server version that supports unlimited numbers of users at multiple sites; and Cloud is a hosted product that supports unlimited numbers of users and is available through a perpetual license or subscription. The company attributes its recent success to the Cloud version, which it supports through multiple data centers in North America, Europe and Asia Pacific. This and its longstanding team of call center experts and partners prepares TelStrat to help organizations of all sizes improve contact center agent performance.

In whatever configuration Engage includes five products. Engage Record can capture all calls, including those outside the contact center. It can record calls from a variety of PBX systems, is highly scalable, has watermarking for security purposes and includes 256-bit AES encryption. It can be configured to record all calls or capture calls based on defined rules or on demand. Record includes an advanced capability called Conversation Save that supports multiple options to store very large volumes of call recordings and includes disaster recovery. Advanced search capabilities enable users to find and listen to specific call recordings, and it includes flexible reporting that can be customized to individual requirements. There is also an option for authorized users to listen to live calls and make notes about what they hear. Overall Record provides comprehensive recording capabilities that should match most organizations’ requirements.

Engage Capture complements Engage Record by creating video records of how agents use their desktop systems to handle interactions. The videos are linked to the relevant section of the call recording so quality managers can see what agents were doing as they listen to the call recording. This added perspective of agent performance helps quality managers assess, score and recommend follow-up actions such as coaching or e-learning.

A third module, Engage Quality provides a full suite of agent quality management capabilities. Users can create evaluation forms for specific call types, select and complete agent evaluations and trigger actions based on the outcomes. It also has capabilities to create coaching and e-learning materials, which can include selections of embedded call recordings so agents can hear what they said and understand where they need to improve, schedule coaching and e-learning sessions, and monitor the outcome of those sessions. Its reporting capabilities can point out sessions agents should take, note the sessions they actually took and assess impact of taking the sessions on their subsequent performance. In total these capabilities support the end-to-end quality management process, from assessing performance to driving change.

In the fourth area, workforce management, rather than develop its own products, depending on the model selected, Engage Manage is built on either the Pipkins or the Teleopti product. TelStrat is a development partner of each of them and embeds their products in the Engage suite, develops tight integration capabilities and customizes them to suit its customers. It also provides end-to-end customer support, which means from a customer perspective that workforce management seamlessly integrates with the other TelStrat products.

TelStrat has taken a similar approach for its other product, Engage Analyze, which combines in-house products and third-party systems, notably from CallMiner for speech analytics. The speech analytics component can analyze call recordings using phonetic indexing to index calls, find words and phrases, and display maps of common words used within calls. An advanced version uses a dictionary to understand languages, the context within which words are used and the emotional state of the speaker. Using it in conjunction with capabilities that convert speech to text, users can analyze every word in every call, text message, chat session or social media interaction. The desktop analytics component can be used to insert controls into call recordings; it can, for example, pause and resume recording based on desktop event, censor sensitive data, pinpoint recordings from desktop data, and tag call-related activities with the recoded session. TelStrat also supports interfaces with common CRM systems so call recordings can be tagged or supplemented with other customer data. An advanced version of the desktop analytics enables users to analyze desktop activity and visualize processes through timelines and heat maps.

In our benchmark research into next-generation workforce optimization, more than three-quarters (78%) of organizations said it is vr_NGWO2_06_use_of_agent_workforce_applicationsvery important to improve agent performance so that customer interactions are handled as efficiently and effectively as possible. To this end the majority have deployed one or more of the core workforce optimization products: the three most common are call recording (78%), quality monitoring (70%) and workforce management (59%); these options are especially common in large companies. I believe that companies of all sizes can also benefit by deploying such systems, and choosing cloud-based products such as those available from TelStrat makes this technically easier and more affordable than doing so in-house. Companies that have deployed such systems have on average realized five benefits, chief among them improved agent coaching (66%) and improved customer satisfaction (52%). However, our research also shows that companies are likely to achieve even greater benefits by using an integrated suite of workforce optimization products; nearly half (48%) of participants said it is very important that these products are fully integrated. This not only makes them easier to manage but allows organizations to connect processes they have been disconnected; for example, they can use speech analytics to uncover coaching needs and raise alerts and workflow items to schedule the required coaching. The cloud-based services from TelStrat help organizations achieve this and similar objectives, so I recommend that organizations seeking to improve their end-to-end agent performance management processes evaluate how its suite can help.


Richard J. Snow

VP & Research Director