Ameesh Divatia is co-Founder & CEO of Baffle, Inc., with a proven track record of turning innovative ideas into successful businesses.
Some estimate that 90% of the world’s data has been produced in the past two years alone. This proverbial tidal wave of information positions businesses to inform decisions that optimize operations, attract and retain customers, and create significant market differentiation. The challenge is how to make sense of data in multiple formats that emanate from disparate sources.
For data to provide value, it must flow through what is referred to as the analytics pipeline: the infrastructure used to collect, store and process data in an IT environment. In the analytics pipeline, unstructured data — such as emails, Excel spreadsheets, Word documents, presentations, instant messages, photos, audio and video — enters “upstream.” As data moves “downstream” toward the end of the pipeline, it is cleansed, organized and analyzed via predictive analytics and machine learning.
At this point, data is at its peak value and, consequently, is more attractive to hackers. For this reason, data protection must be an integral part of the data analytics pipeline to prevent incidents that can offset the many benefits that analyzed data can provide.
Cloud Storage And Data Sharing Risk
Before exploring how to secure the analytics pipeline, let’s look at two important business trends that will benefit from such protections: cloud storage and data sharing.
The cloud’s limitless storage capabilities are prompting enterprises to migrate data from on-premises environments, store it in data lakes and extract useful data into warehouses for analysis. Downstream data stored in the cloud is a target for criminals due to its high value and because it is often not protected properly. Many organizations relax security controls to momentarily enable easier access, but forget to restore the protection that it requires.
Further, Palo Alto Networks (via Help Net Security) found that 43% of cloud storage is left unencrypted, even though cloud providers encrypt those buckets by default. This is an alarming statistic for organizations incorporating the cloud as part of their analytics pipelines — especially for those tasked with complying with regulations like GDPR and CCPA, with almost half of their crown jewels ready to be stolen.
Risk of exposure is further compounded when data moves outside of an organization. Many enterprises rely on data sharing as an integral part of their operations or in collaboration with other organizations to solve problems and gain insight. The Ponemon Institute found that, on average, companies share data with 583 third parties. The same study crystallized the risk of this practice, with 61% of U.S. CISOs experiencing data leakage via a third party.
This creates a conundrum: Stop sharing data, or share it in an insecure manner. Without a secure analytics pipeline, these two critical elements now represent unnecessary risk.
Securing Data Throughout The Pipeline
Security controls for the analytics pipeline can be categorized into two groups: visibility and entitlement. According to Gartner, visibility pertains to implementing “controls that remove ambiguity