Select Page

What is dark data and should you be worried?

Author: Tim Steele

As organizations continue to handle ever-growing amounts of data, it is important to understand the term dark data and its potential impact on you and your organization.

This so-called “dark data” is a term for data collected through normal business operations but which is not used for anything else. Put another way, dark data is unstructured data that is never analyzed—meaning we do not even know if the data is redundant, obsolete or trivial (ROT) because it has not been located and evaluated. Dark data with zero content awareness or visibility can lead to a lot of major problems. Heureka often cites the fact that 80% of organizational data is unstructured.

Dark data is the subject of a recent Forbes column, which covers the topic through the lens of the $76 billion field of data-driven market research. The piece points out that, by definition, we do not know what hides in our dark data, which makes it potentially damaging under a variety of circumstances. Perhaps, if it weren’t going unused, the data could offer important insights. Or perhaps it is useless but poses a security threat (remember, data privacy doesn’t happen without data visibility).

90%: Share of unstructured data that goes unanalyzed, according to IDC.

Did You Know: “The following categories of unstructured data usually are considered dark data:

Customer Information
Log Files
Previous Employee Information
Raw Survey Data
Financial Statements
Email Correspondences
Account Information
Notes or Presentations
Old Versions of Relevant Documents



Confronting Dark Data

It is of course important to know how dark data comes about and how to address it.

The prevalence of dark data is often caused by misplaced priorities or a lack of communication between departments and technical constraints. Appropriate data collection and management policies should be in place and robust encryption standards should be applied. The graphic below demonstrates the problem when it comes to dark data with unstructured data representing the largest (and growing) portion of the big data world. This article takes a deeper dive into the importance of dark data within Big Data sets, however many large organizations must deal with the same growing problem around unstructured, dark data.

Image from

Most companies do not have gameplans for managing unstructured dark data. The Forbes column suggests asking some questions to find the right tools, such as who is using it, does it have any business value at all, and the scale of the needs and what is being tracked.

The Heureka Intelligence Platform is perfectly positioned to help companies gain insights into unstructured data by leveraging the ability to search, auto-classify and manage data across your enterprise. Heureka is unique in its ability  to index your sensitive data in place without moving large amounts of data or centralizing all of your indexes into one place. Best of all, dark data and its content become illuminated. Once illuminated, a whole host of Heureka actions such as collection, quarantining, deleting and classifying become available on a broad scale.

Our platform is perfect for enhancing your E-Discovery, GDPR, regulatory, privacy and compliance applications. Real-time data access and intelligence is increasingly important in light of evolving privacy laws, complex litigation and increased focus on data governance.

Amid digital transformation, your organization must begin to understand the nature of your data, including dark data. Heureka allows you to gain insight and control over your unstructured data at a level never thought possible.

Visit for more information.