Data archiving best practices: what is active archive?

Simply put, an active archive is one where data is organized, accessible, retrievable, and intelligently retained, making your archived data useful to the organization.  In the past, archives were thought of strictly as a long-term repository for highly infrequently accessed data – think cold storage – and so not much thought was put into intelligently managing this data.  The hope was that your archive was like an insurance policy that you would never need to use.

But not only has data overall continued to grow unabated at tremendous rates year over year, but specifically unstructured data has led that charge, with growth rates of 60% or more and predicted to represent 90% or more of all data within just a few years.  Unstructured data such as office documents, videos, audio files, images, .pdfs, and anything not in a database has now become the lifeblood of most organizations, and intelligently storing this data over the long term is critical not only for compliance and organizational history, but increasingly for business intelligence, analysis, data mining, and other purposes.

What organizations need is a way to intelligently and cost effectively manage their unstructured data for the long term, not just to save money but increasingly to leverage the data as a critical corporate asset to be mined and used for benefit.

Data Archiving Best Practices

Data should be organized.

Unstructured data tends to be messy – a typical organization can have millions and millions of files not necessarily organized in any particular fashion.  To make sense of this, it’s helpful to be able to classify and tag data based on categories that are important both internally and externally.  Think “confidential” or “legal” as useful flags for the ability to retrieve data in the event of an audit, but, more than that, all sales data, all financial data, etc. could be classified for fast and easy retrieval for future use.

Data should be accessible.

You need to be able to store your data where you want and get at it easily.  This could mean on-premises in a private cloud, or, increasingly, in the public cloud or clouds.  We’re beginning to see increased competition among cloud vendors, and having the ability to take advantage of changing cloud economics is extremely valuable.  An Active Archive should support both on-premises and true multi-cloud with the ability to dynamically switch storage destinations among cloud vendors, and not require the administrator to remember where that data is.

Data should be retrievable.

Complementary to classification and tagging is full-content search.  Imagine the ability to quickly and easily search through petabytes of data with millions (or billions) of files to find that needle you were looking for, using a word or a string of words rather than having to know where or when a file was saved.  This opens up what has been an opaque black hole of practically unusable data into a usable repository.

Data should be intelligently retained.

If you ask an audience of IT administrators what their corporate policy is on data retention, the vast majority of them will tell you they keep everything forever.  Data governance is a huge topic, more than we can get into here. Suffice to say that data archiving best practices are not to keep everything forever, but to intelligently prune data no longer needed, for legal, space, cost, and other reasons.  An Active Archive is one that helps an administrator to set policies to enable intelligent pruning of data no longer needing to be retained, freeing up space and decreasing storage cost.

In summary, an Active Archive provides intelligent, multi-cloud data management, making the long-term storage of an organization’s most critical asset – its data – useful, today and forever.  Aparavi exactly fits this bill. Interested in learning more? Request a demo today.