The Big Picture: It’s no secret unstructured data is growing at an explosive rate. Add to this the complexity of multiple workflows, sites and cloud infrastructures—not to mention security—and you’ve got a recipe for trouble. You can’t apply yesterday’s legacy technologies to solve today’s problems. They’re inefficient at best and non-compliant or unsafe for your company’s data at worst. In this short video, learn how Aparavi takes advantage of today’s technologies and webscale infrastructure so you can manage a limitless quantity of unstructured data—regardless of how heterogeneous your infrastructure is.
Transcript: The New Era of File Backup
Mike: Hi, I’m Mike Matchett with Small World Big Data. And I’ve got here with me today Rod Christensen, who is the CTO, the smart guy from Aparavi. Aparavi is in technically the file backup space. But I tell you what, listening to these guys and what they’re doing, backup has changed significantly. We are now not just talking about taking your files, taking copies of things, moving them off in a snapshot way and filling up ten petabytes of old archives and running your cloud bill through the roof.
We’re talking about how do you get the maximum value out of your backups? How do you get all the functionality you want out of it? How do you some really cool things with it? And how do you, most importantly, do backups in a multi-cloud world so that it makes sense for you, optimizes your space, your costs, your benefits, your value, functionality, all this stuff? These guys have thought it through. Welcome to the show, Rod. How are you guys doing today?
Rod: Thanks, Mike. I appreciate being here. We’re doing great. How are you doing?
Multi-Cloud Protection and Retention
Mike: Yeah, good. So Aparavi, multi-cloud active archive. Let’s just start with that “multi-cloud” bit for because I think that’s really interesting. What do you have to bring to the table in a new kind of solution to really handle multiple clouds today? What is it that you really had to rethink through?
Rod: Well, you know, a lot of vendors called multi-cloud have the ability to target one cloud and then use that as an output device and then, if they want to, they can go to another cloud and use that as a different device. But you actually really have to pick one or the other.
Now with Aparavi, it allows you to put data into one cloud and then later on after you get a better deal on another cloud or a different relationship with a different vendor, cloud vendor, you can switch over to that cloud and use both clouds at the same time, simultaneously. Without actually moving your data from the first cloud, you can start writing new data over to the second cloud. And when it goes to recover data, it seamlessly knows where all that stuff is and picks out the right data from the right cloud at the right time.
Mike: This is part of the value of Aparavi. What we’re really saying is it’s not disapparate, right? Aparavi comes from a different word, you were telling me, to prepare and to plan and to get things right. Not to make your data disappear, although that’s what I think of, with the Harry Potter part.
Aparavi: The Architecture
Mike: But what we’re really talking about, how do you deal with masses of unstructured data that are growing today and you don’t just wanna deal with it the same way we used to. We know we’re going to end up with petabytes of stuff. If I use one of today’s—I hate to say it, but—legacy file backup solutions and I target the cloud with it. I also end up with this situation where I can’t really use what’s in the cloud directly, right? I have to still come back out and in some ways rehydrate or come back out of that system before I can use that data. And that’s not the approach you guys took. You looked at it a little bit differently, right?
Rod: We looked at it very differently, and that’s how you access data once it’s in the cloud. Obviously, you can recover data from the cloud to on-premises and get all your data rehydrated, deduped, dedeltafied, and all that kind of stuff. But the real value of the data is actually in data analytics, e-discovery, and things like that. The data is sitting in the cloud. How do you make use of that?
So we’ve come out with a public domain DLL and and shared object that you can actually put up into a cloud instance, write a program or connect to an e-discovery, a gateway to an e-discovery. It gives you complete access to the archive data without bringing it down back on-premises. So basically if you have ten petabytes of data sitting up in the cloud, you can access that data without the rehydration and the egress fees that are normally associated with it. You know, trying to bring that data back on-prem for any kind of analysis is just impossible once it’s in there. So you have to be able to make the data accessible where the data really is.
Aparavi Supported Eco-System
Mike: And it’s really three things with that. And I know we talked about you have this open data format in which what you’re saying, so I can get to that data in a standard way no matter where it’s sitting. And that makes it very useful, globally accessible, makes copies also for test dev and everything. You need to have a security layer on top of it. So you still have to impose all the constraints to get at different things. And we know that’s one of the values you guys also bring to the table.
Cloud Retention Simplified: Powerful and Easy Secondary Storage Management
Mike: So you’re not just dumping it up there into S3 and saying, “Here’s the bucket. Go get it.” You still have to go through your management system to get at the data. But tell me a little bit about, I think what most people first think of is, “Hey, I’m putting all this data up in the cloud, capacity optimization. I don’t want my cloud subscription cost to go through the roof. I want to use multiple clouds.” What do you guys do to the data that makes the cloud really an effective and cost-efficient option?
Rod: That’s a great question. The first thing to really optimizing data storage and data capacity is recognizing what you have in the first place, what kind of data you’re dealing with. If it’s documents or Excel spreadsheets or PDF files or something like that, you really need to understand what you’re dealing with. Because once you understand that, then you can classify the data as to its importance.
Once you classify the data as to its importance, then you can set policies on how long that data is to be retained. For example, say if you have PDFs that you want to keep a couple of years but all your doc access that have legal information have special aspects and characteristics to comply with regulations, you need to keep those a lot longer. So you can set data retention periods of seven years on those.
In addition to that, we can recognize things like social security numbers, phone numbers, addresses and things like that within the dataset itself, while we’re actually copying it, so that you can do queries on it to say, “Give me all the documents with personal identification information in it.” Or search for a particular person or word or whatever you want to do. That’s how you actually multi-use the data that’s out there.
And the most important part of it is once the data is no longer needed, get rid of it. You don’t need it anymore. You can save a huge amount of cost. A lot of companies are actually saying, “Okay, well we’re just gonna throw stuff up in the cloud, and we’ll keep it for seven years.” But what happens if you only need three-quarters of that data? For a two-year savings, you can actually obtain by getting rid of 75% of data for years 3 through 7.
Multi-Cloud Active Archive and Protection
Mike: Yeah. And so, in that way of going in and carving pieces out of it, you’ve got a couple of clever technologies, right? So you’ve got some things we talked about earlier. I think we talked about pruning is what this whole thing was called, right? And, you so, if somebody says, “Give me all the documents in this dataset,” you don’t necessarily deliver the metadata pointers to each of the objects in that dataset directly, right? Because that gives them a static thing and it’s almost like you can’t garbage collect it underneath. What you do is you give them some metadata pointer that you can go back and still prune back within that, live and dynamically. Maybe you can explain a little bit more about that.
Rod: So with pruning, pruning is actually some of the secret sauce in here. It really is. And it’s actually pretty complicated to map out.
Mike: Well, we only have a couple more minutes, Rod.
Rod: I know, I know. But the thing is that what pruning does is that it only keeps the data as long as it needs to be kept. Let’s say that you have a retention period set for two years. Anything beyond that two years will automatically be removed from the system so you no longer need it. Unless that is being held or referenced by something else further down the line that, say eight, six months ago, it will actually keep that data until that secondary document that relies on that data actually expires, and then it can get rid of the whole thing. So that’s how the pruning in the data management works. I have a white paper on it and it’s like how many pages? Three, four pages long that explains it. It’s pretty technical though. As a warning!
Mike: Tell us where we could find that paper.
Rod: It’s on aparavi.com.
Multi-Cloud Active Archive Defined
Mike: aparavi.com. Great. And I should point out that Aparavi is, at first level, a SaaS service and works on whatever targets you want across the board so people can get up and start pretty quickly with it, right?
Rod: Yup, absolutely. Onboarding is very simple.
Mike: Awesome. There’s a whole bunch of more technologies that I would love to get into. You’ve got some snapshotting discussion that we got into. We got into some things about data analytics and about the storage analytics and about the open data format. So tons more things to really dive in as you guys are uniquely packaging together into what I think is really one of the smartest assemblages of backup and archive software for the cloud. I mean, it just really puts it together in one piece. So, kudos to you guys for that.
Rod: Thank you very much. I appreciate it.
Mike: And again, that’s Aparavi. Check it out. I think we don’t have too much more time. Do you have any final thoughts?
Rod: Not for me.
Mike: Well, thank you for being here Rod. Hopefully, we can do this again and dive into some of those other topics in some deeper episodes. Thanks for attending today. Check it out. There’s lots going on in the file backup, archive, data management space. And I’m sure we’re gonna hear more about Aparavi as they roll out yet some more features going forward. They’re just getting started. I’m Mike Matchett with Small World Big Data. Thanks for being here, Rod.
Rod: Thank you, Mike.
Mike: Thanks for watching and come back soon. Bye.