Investigating the Potential of Data Preparation

Data preparation is critical to the effectiveness of both operational and analytic business processes. Operational processes today are fed by streams of constantly generated data. Our data and analytics in the cloud benchmark research shows that more than half (55%) vr_dac_23_time_spent_in_analytics_updatedof organizations spend the most time in their analytic processes preparing data for analysis – a situation that reduces their productivity. Data now comes from more sources than ever, at a faster pace and in a dizzying array of formats; it often contains inconsistencies in both structure and content.

In response to these changing information conditions, data preparation technology is evolving. Big data, data science, streaming data and self-service all are impacting the way organizations collect and prepare data. Data sources used in analytic processes now include cloud-based data and external data. Many data sources now include large amounts of unstructured data, in contrast to just a few years ago when most organizations focused primarily on structured data. Our big data analytics benchmark research shows that nearly half (49%) include unstructured content such as documents or Web pages in their analyses.

The ways in which data is stored in organizations are changing as well. Historically, data was extracted, transformed and loaded, and only then made available to end users through data warehouses or data marts. Now data warehouses are being supplemented with, or in some cases replaced by, data lakes, which I have written about. As a result, the data preparation process may involve not just loading raw information into a data lake, but also retrieving and refining information from it.

The advent of big data technologies such as Hadoop and NoSQL databases intensifies the need to apply data science techniques to make sense of these volumes of information. In this case querying and reporting over such large amounts of information are both inefficient and ineffective analytical techniques. And using data science means addressing additional data preparation requirements such as normalizing, sampling, binning and dealing with missing or outlying values. For example, in our next-generation predictive analytics benchmark research 83 percent of organizations reported using sampling in preparing their analyses. Data scientists also frequently use sandboxes – copies of the data that can be manipulated without impacting operational processes or production data sources. Managing sandboxes adds yet another challenge to the data preparation process.

Data governance is always a challenge; in this new world it has if anything grown even more difficult as the volume and variety of data grow. At the moment most big data technologies trail their relational database counterparts in providing data governance capabilities. The developers of data preparation processes must adapt them to these new environments, supplementing them with processes that support governance and compliance of personally identifiable information (PII), payment card information (PCI), protected health information (PHI) and other standards for the handling of sensitive, restricted data.

In the emerging self-service approach to data preparation, three separate user personas typically are employed. Operational teams need to derive useful information from data as soon as it is generated to complete business transactions and keep operations flowing smoothly. Analysts need access to relevant information to guide better decision-making. And the IT organization is often called upon to support either or both of these roles when the complexities of data access and preparation exceed the skills of those in the lines of business. While IT departments probably welcome the opportunity to enable end users to perform more self-service tasks, they cannot do so to the extent that it ignores enterprise requirements. Nonetheless, the trend toward deploying tools that support self-service data preparation is growing. These two trends can lead to conflict for organizations that want to derive maximum business value from their data as quickly as possible while still maintaining appropriate data governance, security and consistency.

To help understand how organizations are tackling these changes, Ventana Research is conducting benchmark research on data preparation. This research will identify existing and planned approaches and related technologies, best practices for implementing them and market trends in data preparation. It will assess the current challenges associated with innovations in data preparation, including self-service capabilities and architectures that support big data environments. The research will assess the extent to which tools and processes for data preparation support superior performance and determine how organizations balance the demand for self-service capabilities with enterprise requirements for data governance and repeatability. It will uncover ways in which data preparation and supporting technologies are being used to enhance operational and analytic processes.

This research also will provide new insights into the changes now occurring in business and IT functions as organizations seek to capitalize on data preparation to gain competitive advantage and help with regulatory compliance and risk management and governance processes. The research will investigate how organizations are implementing data preparation tools to support all types of operational and business processes including operational intelligence, business intelligence and data science.

Data is an essential component of every aspect of business, and organizations that use it well are likely to gain advantages over competitors that do not. Watch our community for updates. We expect the research to reveal impactful insights that will help business and IT. When it is complete, we’ll share education and best practices about how organizations can tackle these challenges and opportunities.


David Menninger

SVP & Research Director

Follow Me on Twitter @dmenningerVR and Connect with me on LinkedIn.

Workiva Automates Composite Documents with Wdesk

Workiva offers Wdesk, a cloud-based productivity application for handling composite documents. I use the term “composite document” to refer to those in which text is created and edited collaboratively by multiple contributors and which incorporates tabular and numerical data from multiple sources in a controlled process. Composite documents often have formats defined by law, regulation or contract and must be created at periodic intervals. To comply with the requirement by the United States Securities and Exchange Commission (SEC) that companies “tag” their financial filings using eXtensible Business Reporting Language (XBRL), many companies acquired software to automate the creation and tagging of these composite documents.

Workiva began as WebFilings and initially offered software to streamline the SEC document submission process. In 2013 it released Wdesk to address the larger market for composite document creation. The software has uses beyond SEC filings. They include a variety of documents or presentations for external or internal purposes that corporations routinely produce, including board presentations, management reports, audit management, disclosure documents and other regulatory or compliance filings. Using such software, companies (and especially finance departments) can cut preparation time, complete documents sooner and substantially reduce errors in them.

Software products for handling composite documents like Wdesk have capabilities similar to those of document management applications except that they are designed to be easily used by business people with limited or no involvement by technical specialists and at much lower cost of ownership. This is especially true for cloud-based software. As is the case in using document management software, the text portion of the composite document is produced and reviewed by many people in multiple departments for various purposes in a defined workflow that includes approvals. To facilitate reviews, Wdesk enables approvers to read, comment on and accept a document or any component of it on a mobile device. In the process of creating the document multiple versions are created and the software ensures that people work only with the current version. Permissions for creating, editing and approving the document can be granular (such as limited to a specific paragraph or table or even a single data point). Especially for internal documents (such as Sarbanes-Oxley Act attestations) Wdesk can connect substantiating documents directly to specific parts of a document.

vr_fcc_data_quality_significance_updatedThe sections and basic form of a composite document may be highly structured, in which case the software automatically maintains this structure and all formatting. The format includes the order of the sections, the section headings, specific wording in boilerplate sections, paragraph styles and even the typeface, to name the most common requirements. If the document is a periodic filing, it must be consistent from one period to the next, keeping the format and structure of each individual section exactly the same. Wdesk also ensures that text and numbers that are reused across multiple documents and presentations are consistent.

In addition to consistency, another major advantage of using Wdesk to automate the document creation process is that it can significantly reduce the incidence of errors while reducing the time devoted to checking the document for them. For example, numbers referenced in the commentary must agree with those in the tables. These numbers often change over the course of the drafting period, sometimes frequently and on occasion late in the process when deadlines are short. A composite document application will always contain the most accurate and up-to-date numbers. This is important because in our benchmark research on the financial close research three out of five participants said that the consistency and quality of data in company reports is a significant or very significant problem.

As the numbers (such as financial and operational results) referenced in a table change, the numbers in the narrative associated with those numbers, as well as any associated percentage, change citations. For example, in the statement “advertising expense was $X, up Y%,” the numbers X and Y will always be in agreement with each other and any table containing them. Automation can also help because some types of regulatory documents and filings have particular requirements that must be enforced. For example, when financial data is presented in a shortened form (in thousands or millions of currency units, for example), the rounding often must adhere to a specific convention.

Using a software application designed to automate and support the process of creating filing documents can reduce the amount of time and effort necessary to produce the final result. It does so by establishing a repository of record for the text and data, automating the compilation of the document including the tabular data and individual text sections, using workflow to manage the process, and applying controls and audit features.

Using such software enables corporations to achieve substantially greater efficiency as well as tighter and more consistent control over this process. Process management capabilities can cut the administrative workload for people who “own” the filing document and reduce the possibility of delayed handoffs and missed deadlines. Document management features enable administrators to track the progress of the individual components, automate reminders to individuals as deadlines approach and generate alerts if they miss start or completion times. In contrast, when regulatory filings and similar composite documents are assembled using personal productivity software and orchestrated through email attachments and notifications, the process needlessly occupies the time and attention of highly trained, well-compensated people who have to spend hours performing dull, repetitive tasks that require their skills. Automation on the other hand leaves only the essential work to be done, allowing expert individuals to focus only on that and have more time to concentrate on their real jobs.

Using software to automate and control the creation of composite documents for external or internal users can substantially cut the risks of errors and missed deadlines. This software can be used broadly to address multiple regulatory and legal requirements in the finance, legal, internal audit and other departments. I recommend that companies – especially their finance and legal departments – that create composite documents automate their production and investigate whether Wdesk will address their requirements.


Robert Kugel

Senior Vice President Research

Follow Me on Twitter and

Connect with me on LinkedIn.