There is an ominous ring to the word ‘dark’. The words dark web, dark matter, and dark night, to name just a few, evoke feelings of fear and anxiety.
Dark data is no different.
Getting its name from its invisible nature, dark data is being increasingly considered a problem in the making for organizations. Gartner (1) defines it as ‘the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).’
Dark data is therefore information that organizations collect and store, advertently or inadvertently, but hardly ever use, process, or analyze. More often than not this data is unstructured and unrefined. Most organizations are in the dark as to what, where, and how much of this data they have on the cloud and on-prem.
Types of Dark Data
In essence, any data that is generated and stored has the potential to become dark data, if it is not used. The internet is however the biggest contributor to the volume of dark data, simply because of the staggering amount of data that is created at any point. In a simple case of a social media post, dark data can be created around user logins, edits, images, geo-locations, tags, users, likes, comments, and more!
In an organization, dark data can comprise all and more of the items in the infographic below.
Logikcull quotes the IDC (3) estimates of dark data as being 90 to 95% of all data, with a strong likelihood of it growing to 97% with a CAGR of 23%. When one considers that this data often lies idle and unutilized, one realizes the enormous cost of maintenance, let alone the security risk that it involves.
Gartner (4) had already sounded the alarm bells, estimating that 80% of organizations in 2021 would fail to develop a consolidated data security policy across silos, leading to potential noncompliance, security breaches, and financial liabilities.
Because it contains so much information about the organization, dark data can pose significant risks.
- Forbes (2) warns that the generation of dark data across multiple vendors and containers without making changes to the data structures can result in data residing in the cloud at numerous points, leaving an ever-increasing invisible digital footprint of the organization that can result in potential violations of privacy and security issues
- In view of its invisibility, organizations can never be absolutely certain that they are offering information for compliance that entirely meets regulatory processes
- Identifying and accessing this data both on-cloud and on-prem can prove onerous and extremely time-consuming
- Dark data generation is invariably an ongoing process. Since it is created concurrently with regular data that is put to use, it contains data elements that are sensitive and potentially damaging to an organization’s future, should they fall in the wrong hands
- Maintaining this data and paying for remedial measures should the data be misused, present extreme cost challenges
- Gartner (4) says a change in mindset is needed by moving away from a ‘data hoarding’ and a ‘save everything just in case attitude that most of us are guilty of if the accumulation of dark data is to be stemmed
The Bright Side
Despite the apparent danger and potential for misuse, there is a bright side to dark data as well. Organizations would do well to remember that this dark data, despite its state, is still ‘proprietary data’ and still has value, if handled effectively. The advent of Machine Learning (ML) and Artificial Intelligence (5) offer viable solutions for the humungous task of handling dark data. AI in particular can provide useful insights into the seemingly-bottomless lake of data an organization may have. While there may be considerable costs of investment involved, the investment is well worth it considering the benefits.
The benefits that could accrue include:
- A viable way to understand how incoming and data processing is handled in future.
- Creation of management strategies to provide long and short-term trend analyses around information.
- Development of new and productive business strategies including data retention policies
- Improvement on internal processes by understanding the main centers of dark data creation.
- Quality Assurance improvement by introducing processes that detect and correct errors, while looking at potential privacy loopholes, vulnerabilities, and compliance violations.
- Creating revenue and reducing costs by understanding and analyzing the relationships between unrelated pieces of information.
- Obtaining insights into user behavior and other data that can help in improving and expanding business.
A good thing for organizations to remember when dealing with dark data would be that, despite its volume, unrefined state, obvious risks, and maintenance costs, it need not be so dark after all. Though comparisons can be odious, a good analogy could be that every dark cloud has a silver lining and after the dark, unfailingly comes the light.
Click Here to know more about the integrative cloud security solutions that Aurora has to offer through CloudCodes. For a more comprehensive understanding of our cybersecurity services reach us at firstname.lastname@example.org or call +1 888 282 0696
- Definition of Dark Data – IT Glossary | Gartner
- Dark Data: The Cloud’s Unknown Security And Privacy Risk (forbes.com)
- What is Dark Data and Why It Is Important for Discovery (logikcull.com)
- How to Tackle Dark Data (gartner.com)
- Dark Data: What is it? How can you best utilize it? – Cybersecurity Insiders (cybersecurity-insiders.com)