I know, in a perfect world we’d be able to consolidate down to a single backup solution that protects every platform perfectly. History, however, has shown that the data center, especially when it comes to data protection, is far from perfect. Every time a new environment appears, a new class of data protection application appears. Despite the theoretical advantages of consolidated data protection, the reality is IT has a job to do: protect data, no matter what.
Consolidated solutions are typically very slow to support a new environment or changes to an existing environment, which enables new solutions to enter the market to solve a specific challenge or take advantage of a new capability. We saw it early on when networking allowed multiple systems to communicate, and solutions like Networker and NetBackup appeared. We saw it again when virtualization became a production environment and solutions like Veeam appeared. Today we are seeing new solutions focus specifically on NoSQL backup and container backup. In the end, because these solutions are for specific environments, they are often much easier to use than “do-it-all” backup applications, eliminating the theoretical advantage of consolidating to a single solution.
Another major change occurring in data centers right now is the changes to unstructured or file data. I don’t have to tell you it is growing–rapidly. The growth is occurring not only in the size of each individual file but also in the number of files that a file system stores. The latter problem is so bad that next generation NAS solutions are using their ability to support trillions of files without performance impact as a competitive advantage over legacy NAS solutions that can only support a billion files.
A more significant change in file data is the increasing level of importance that it has to the organization. The majority of employees at most organizations are knowledge workers. Knowledge workers create data, and most of that creative energy ends up in standard office productivity files like Word, Excel and PowerPoint. George Crump over at Storage Switzerland had an excellent blog post explaining the importance of these files that are often taken for granted.
Unstructured data is more than just data created by users of Microsoft Office, however. Machines, in the form of IoT devices, surveillance cameras and log data from discrete systems, generate far more data than even the most productive humans can. This data also has immense value to the organization. In the case of sensor data, it can often only be created once because it is a sampling of conditions at a particular moment in time. Organizations need to protect this machine-generated data as well.
Also changing is the outside world’s view of an organization’s file data. Governing bodies around the world are passing new legislation that requires the protection, retention and even deletion of file data. The fines associated with not complying with these regulations are significant and higher than we’ve ever seen before.
Protection of file data has always been a pretty casual affair for most organizations. The general policy, still today, is copy your unstructured data to a file server and then IT will back that up. The problem is that most end-users, probably because they are not IT professionals, don’t faithfully copy their data to a file server. More concerning is that most IT professionals aren’t equipped with file protection software that can help them protect files given the changes described above.
Typical legacy data protection solutions protect files one at a time. They log in to a file-system (file server or NAS) and essentially scan the entire file system from top to bottom looking for new or modified files. In the days of thousands of files on a file server, the scanning approach worked, but, in the era of millions, if not billions of files, it can take longer to scan the file system than it does to actually copy the data to the backup device. The length of time spent scanning also leads to opportunity for failed backups, and, as a result, it might take days for file servers to be protected. Because of their tape heritage, legacy products also store these files in blobs. The good news is these solutions know which file is in which blob, so as long as you know the file name, it can easily be found. The bad news is setting different retention policies on specific files within the blob or across blobs is impossible.
Newer backup solutions use an image-based backup technique that backs up a file server at the volume level. They can ignore the file-by-file scan and just back up the blocks that have changed. The problem with the image based backup approach is the organization loses all visibility into the files being protected. These solutions still provide single file recovery, usually by mounting a virtual image of the volume based on a particular backup job, but finding an individual file, without knowing specifically what backup job it came from, can be very difficult. Image backups also have a retention challenge. Retention can only be set at the backup job level.
Legacy and even newer backup solutions simply aren’t up to the challenge of protecting modern file data. The new reality of file data requires a new approach. At Aparavi, we think we have that new approach with File Protect and Insight. We’ve developed a high-speed method of protecting unstructured data using a file-by-file technique. Backups complete fast, and you can execute them frequently, which is good for ransomware ransomware protection. You can set very specific retention policies based of file type, and you can remove data from the backup, good for meeting data privacy regulations.
Want to learn more? We recently presented on stage with Storage Switzerland about the new challenges and requirements of protecting file data. The good news is we recorded the presentation, “Are You Treating Data Like a Second Class Citizen?”, and you can watch it here.