The capacity requirements of unstructured data continue to grow at an alarming rate and the rate of growth is only going to get worse. Storing is problem enough but protecting unstructured data is creating a new challenge that most data centers aren’t prepared for. IT needs to act now to modernize its unstructured data protection strategy.
In this ChalkTalk Video with George Crump, lead analyst from Storage Switzerland, and Aparavi’s VP of Business Development, Jonathan Calmes, explore how legacy solutions face several issues when dealing with unstructured data.
Transcript: What’s Your Strategy for Unstructured Data Protection?
George: Unstructured data is really becoming a nightmare for data centers to manage. It’s growing at unbelievable rates. And I don’t think anybody is surprised at the amount of unstructured data that they’re dealing with, but the rate of growth is unparalleled to what we’ve seen in the past. I’ve invited Jon from Aparavi to talk through this with me. Jon, thanks for joining us today.
Jon: Yeah, absolutely. Thanks, George.
George: So, Jon, what we see here is 25 years ago when I was 12 and started in the industry, we were worried about Oracle databases and Microsoft SQL. And the focus there was availability and how quickly can I get in, get a backup done. And we thought “big” was 200 gigabytes, right? And files were usually user files, and it was almost that you protected them when you could; it wasn’t that big of a deal because they’re just silly users anyways, right?
But now, what we’ve seen is this massive growth in unstructured data, and I don’t know exactly when the point of cross was, but certainly about 10 years ago, this started to become the bigger problem. Because obviously, number one is growth. And the other thing is just quantity of files, right? Just the number of files that we have to deal with is enormous. And what we see is, take data protection for example: typical data protection products just trip all over the number of files, just backing them all up and things like that. And you’ve seen people have to resort to image backups, which have their own problems. Are you guys seeing similar stuff?
Jon: Yeah, absolutely. I mean, that really is the genesis for Aparavi, right? We were working all at legacy backup companies and as discussions came about, we recognized that there’s a massive problem here, not only today and right now, but also the future. And we talked about how the traditional solutions are not preparing data for the future at all.
George: I think that’s a really good point. What I try to tell people is look, what we’re experiencing right now, if you think it’s bad now, this is nothing compared to… I just read a thing the other day that by 2020 there will be about 20 billion or 40 billion devices connected to the internet, all generating data. So the amount of this is unbelievable. The other thing that we’ve seen as a fundamental change here is locations. Now, we have this thing called the cloud that really gives us a whole other thing that we have to deal with, right?
Jon: Yeah, absolutely. I mean, you have to deal with that. But the other thing you have to deal with is multi-cloud, where you’ve got one organization leveraging multiple clouds for different solutions. Or you even have users out there doing their own thing because management isn’t providing the right path.
George: Yeah, and I think the other thing we’ve seen here is also the justification for this retention of this data. And again, back in that era, we could just say, well, delete it. And now, you can’t do that. Either through regulations or just corporate governance, there’s legitimate reasons to keep all this stuff.
Jon: Absolutely. With the prevalence of machine learning and AI, people want to keep data longer to learn from it and see trends. We actually see some customers who say, “I never want to get rid of anything.” Legal is a big thing there—aerospace another thing—from external regulations.
George: What we’re seeing here is legacy solutions—especially in data protection, but even in archive—are really just falling short. They weren’t written in an era to deal with this kind of growth in a multi-cloud thing. Obviously, you guys, as you said, are very focused on this. What are you guys doing differently here to help manage this problem?
Jon: Yes. The first thing I think is the engine. We built this net new for this world, and not just for the world today, but the world for tomorrow. We saw that there was a need and that the current solutions weren’t addressing it. So Aparavi leverages a web-based SaaS platform and the platform to find all your policies. It sets the engines, and it tells everything what to do. We also have an on-premises software appliance, but you could also host that in an EC2 environment if you like. And then we actually do treat the data source itself. And with all these operations together, we are able to do near-CDP type availability with this rapid recovery.
George: It sounds like I’ve got a recoverability capabilities, both on prem and in the cloud, is that right?
Jon: Yeah. So if you are aggregating data in the cloud, you absolutely can recover data in that cloud environment.
George: Okay, great. And then what’s the scale that you guys are designed to handle here?
Jon: It really was built to scale into the petabytes of data, billions and billions of files. Traditionally, we’ve seen organizations do a walk the file system approach every time they’re looking to do deltas or differences. We actually built it so you only have to do that once and you do it on prem, right? You don’t have to continue going back over, and over, and over.
George: So that means these protected copies can occur very, very quickly if I’m only transferring a very small amount of data?
Jon: Yeah, we actually go down to the block or byte level in some circumstances.
George: So let’s talk about how you guys clean out after yourself. I mean, we don’t want this to become the next big growth tier, right?
Jon: Right. Absolutely. So we have a few different types of data movements. We do checkpoints that actually are…think of it as like a temporary recovery cache. And this is data held on, you know, disk here, direct attached, and then you can schedule those to run every 5, 10 minutes, up to you. Fifteen minutes is usually the norm in what we see. And then we’ll actually put a snapshot of the entire discontents, whatever you’ve selected by the policy defined up here on to the software appliance. And once that happens, we will remove all the data that we’ve aggregated through those checkpoints.
The next level is we can actually do archives to cloud, and when we do an archive out to cloud, the same very thing happens. You define how many versions of a snapshot you wan to hold here. If it’s one, if it’s five, and then we’ll start incrementally cleaning up behind ourselves once that data is up into a cloud location.
George: Okay, great. And then from when I go to recover, I just go to the most logical location to get the data back then.
Jon: Yeah, so that will all happen from the UI directly, and you’ll determine where you want the data from. We actually do point and time recovery. So you’re able to select a date and a time, and the software is going to define based on what it knows about your data, where it’s going to be recovering from. So if it knows it has to go out to an archive, or a snapshot, or a checkpoint.
George: You used the word archive a couple of times. It sounds to me like you guys are setting a good foundation for an archive here.
Jon: Yeah, absolutely. By the way that we’re placing data into the cloud, we’re actually able to archive that at a very, very granular level. One of the key things that we can do is we can actually, by policy, remove data out of the cloud and then all the way back down through to make sure that you’re freeing up your primary storage, your secondary storage, and you verify that you’ve got that in an archive. And as soon as its retention policy is hit, even if it’s an incremental change, we can remove those individual blocks or bits as opposed to having to wait until the entire image expires.
George: Right. That’s going to help with things like GDPR, where we’re going to have to get really granular with retention and protection.
Jon: Absolutely. You know, we’ve got a search capability built in where we can actually search by content or by metadata by author and actually sort to remove that data.
George: So if I’m one of our viewers who are watching this, this all sounds good, but I’ve definitely got some sort of legacy backup. Everybody does. Give me the reasons that would motivate me to switch to Aparavi.
Jon: Storage growth is gonna be one of them, obviously, right? By way of how we’re doing this architecture, we’ve modeled out that we can save 75% of storage over a 7 to 10 year retention period. So when you’re talking about this issue, which is only going to get worse. And obviously this issue equals this issue. At the end of the day, Aparavi is going to save you massively in secondary storage, and it’s also going to clean up behind itself all the way back down to the primary data once we archive it off for you.
The other thing is we built the platform completely open; we use an open data format. So that means you’re not going to get locked into Aparavi in 10 or 15 years. There’s no vendor lock-in with Aparavi. We have an open source, public reader document, that .DLLs all of that. The other thing we do is we end cloud vendor lock. And how we do that is we actually allow the free movement of data between different clouds and even back down on premises if you wanted to. So our software fully manages the orchestration of data over a lifetime.
Say you want your data to go to Amazon S3, but then after that you want to go to a group like Wasabi, for example, we fully support Wasabi, maybe your IBM Bluemix, Cloudy and Scality. All of these guys we’ve gone through and certified against as well. So my big takeaway is we’re going to save you big money on storage over a 7 to 10 year retention period. And by nature of how we’ve architected the solution and how you can run through multiple clouds, we’re actually able to end cloud lock-in. So vendor lock.
George: That’s perfect.
Jon: The other thing is Aparavi is an open data format, and so we’ve published the reader, we published the .DLLs, and what that means is not only do we end cloud vendor lock, we end software vendor lock. Because you don’t need us to recover your data. It’s fully access. So we unlock in a few ways. And again, lastly, with these three storage operations, checkpoint, snapshots, and archives, we can provide rapid recovery with near-CDP availability.
George: Awesome. Jon, thanks very much for joining us today.
Jon: I appreciate it.
George: There you have it. If you’re looking to save money, end vendor lock-in, and improve the availability of your unstructured data set, check out the guys at Aparavi.