Data automation is a term that has been used, albeit loosely, for many years. But what is data automation? Data automation is the removal of the human element from all tasks. If we evaluate the scale and footprint of the typical business data set today, the average company manages 163 terabytes. More specifically, the average enterprise has 348 terabytes, seven times the average SMB with 48 terabytes.
Regardless of size, all organizations expect the amount of data to increase considerably in a relatively short timeframe. The average data growth is estimated to be around 20 percent per year. Couple this stat with a dispersed array of data locations, types, and devices, and we can see that the problem is spiraling out of control, more than a human might be able to handle. Considering individual departments within the enterprise with different records schedules and compliance needs, how can this task be accomplished effectively? Automate it!
To understand the problem, we need to look at how the problem was created. Enterprise data sets have grown exponentially over the past 10 years, and by 2025, according to IDC, worldwide data will grow 61 percent to 175 zettabytes, with as much of the data residing in the cloud as in data centers.
A single petabyte of data is roughly 1,073,741,824 files of 1MB each. If we assign a team of file reviewers to go through these files, look at the age, type, owners, and assess the risk of these files. How long do you estimate it would take?
Let’s do some math: if a single person can review 60 1MB files per hour, for 40 hours a week, for 52 weeks of the year, they will only get through 124,800 files per year. This is based on no breaks, vacations, or sick leave. Based on this estimate, with zero data growth, this challenge would take a mere 8,603.7 years to effectively review a single petabyte of data. We could add 100 people to the process and data automation routine using the same productivity metrics above, salaries, insurance, etc.; however, all 100 people will only effectively review a combined 12,480,000 million files. Based on this astounding 100-member team’s ability, this task will take 86.04 years to complete!
From the first day they’re created, and the first day regulations kick in, files need to be managed according to many compliance and governance standards. The data lifecycle of a file can vary, such as early and constant access versus one-off files used a few times at most. However, all files are the same in the beginning. The content defines how the file is regulated. Let’s look at the stages of the typical data lifecycle.
Intelligent data lifecycle automation collects the metadata and the content of the file, maintains a searchable index, examines the file for any threats or malware, applies complete global classification policies to this data, then applies a smart policy to determine where and how long the file should be stored. Fully automated applications can process up to one million files per hour, an efficient way to of analyze and protect data that poses an element of risk to your organization. Enterprise leadership can make use of the collected data, such as business analytics.
Indexing and classifying all data, regardless of its location or device, creates a virtual data warehouse or data lake that allows access to all managed data within one simple search interface. At the end of the data lifecycle, automated processes delete files at the source and the target based on the assigned retention policy and provide proof of the deletion through an audit log or other methods.
Data management platforms have recently been released that contain many of these needed features. The differences are stark, however, if we look at the level of functionality and features. For example, data classification should be as complete as possible, provide global classification policies, and, most importantly, not rely on end users to define the policy. We can also look at eDiscovery, a process in which collection and indexing data for third-party eDiscovery tools consumes most of the time. If the content and metadata of files are already indexed and collected, data lifecycle management tools of eDiscovery are fast and painless.
This blog post provides insights for consideration when evaluating intelligent data lifecycle management automation and the business key benefits. While there are only a few choices with true intelligent data automation capabilities, the feature set needs to be completely evaluated according to your business requirements, along with the cost, complexity, and integration. Often, this requires multiple products, which can cause a whole new set of challenges.
Aparavi is an intelligent data automation platform that reduces complexity, operating expenses, and risk while providing greater insight into your information. Intelligent search and the advanced automation lifecycle helps users find, classify, protect, and optimize their data from the edge, on-premises, and across multi-cloud storage to break through data silos and transform and simplify data operations.