Unstructured data is growing rapidly, and it’s quickly becoming more and more of a target for ransomware attacks. Despite that, most organizations are failing to adequately protect their data, and traditional image-based backups just won’t cut it.
In this webinar, Storage Switzerland‘s George Crump and Aparavi’s Jon Calmes discuss how you can improve your unstructured data protection while at the same gaining insight into that data.
Watch the recorded webinar or read the full transcript below!
Transcript: Are You Treating Unstructured Data Like a Second-Class Citizen?
George: Hi. I’m George Crump, Lead Analyst with Storage Switzerland. Thank you for joining us today. Today, we’re going to be talking about unstructured data. A lot of data centers today really treat unstructured data like a second-class citizen, and it really has become an incredibly valuable asset to the organization. Joining me today is Jon Calmes. He is with Aparavi Software. Jon, thanks for joining us today.
Jon: Thank you. Thank you very much for having me.
George: So, before we get too far in the presentation, why don’t you just real quickly give folks background on yourself and, of course, Aparavi.
Jon: Yeah, absolutely. So, I’m Jon Calmes. I’m VP of Business Development over at Aparavi. And what I do on a day-to-day basis obviously changes. Sometimes I’m working directly with clients. Other times I’m building partnerships with some of the new storage arrays that are out there. Aparavi is an organization dedicated to the protection and insight of unstructured data specifically. We come from the legacy backup space. All of our leadership and engineers have multiple years in backup, and we feel like we truly have actually built a better mousetrap.
George: All right, thanks, Jon. So, for those who don’t know me, I’m George Crump, Lead Analyst at Storage Switzerland, and we’ve been doing very focused storage analyst type of work for over 12 years now.
Comparing Unstructured Data Protection to Other Data Sets
George: So, Jon, let’s start with level setting a little bit and just comparing unstructured data, other data sets, right? I think, first of all, if you look at databases, for example, they’re often snapshotted, replicated. In other words, they’ve got all kinds of protection going on. I’m sure you see that when you’re talking to customers, right?
Jon: Yeah. I mean, we see a mix of different strategies in use that are happening out there, but absolutely, yeah, replication, snapshots, different locations. We see a lot of different approaches.
George: Yeah. And now, also, of course, big on the radar screen for everybody is virtual environments, right? So, virtualization is a big thing. And essentially to me, when virtualization first came out, the data protection story was pretty bad, really. And now it’s actually pretty good. You know, there’s a lot of good solutions on the market, right?
Jon: Yeah, absolutely. And we love those guys out there because they’re focusing on availability of virtual machines. And that’s what’s most important in those environments.
George: And I think the key thing to point out there is that in both of these situations, we’re talking about multiple protection events a day, in some cases, on a minute-by-minute basis. In other cases, every few hours.
Jon: Yep, absolutely.
George: So, then, the other thing that we see a lot, especially in the virtual environment, is this concept of instant restore, what we often call recovery in place. And that’s the ability to instantiate the virtual machine directly from backup storage. So, you eliminate that transfer time back into production. And that’s a feature that I see people just taking advantage of in big ways.
Jon: Yeah, absolutely. I mean, the ability to have a mountable backup is absolutely paramount, especially in a disaster recovery scenario, hardware failures, things like that. Incredibly important to have that mountable point.
George: Yeah, and I also think what’s interesting with that kind of a feature is it also changes the nature of backup because then, beyond disaster recovery, you can use it for tests and dev and all kinds of different other use cases as well, right?
Jon: Yeah, I mean, that’s the whole copy data management movement of a few years back, using secondary data for other purposes. Because of the value that’s in data these days, and all the new tools that are out there to help manage that data, it’s important to have those capabilities.
George: So, then if we talk about unstructured data, the story is not quite as good, right? What we typically see is a once-a-night backup, if that. I see a lot of times if there’s an error in a once-a-night backup, it’s, “Okay, we’ll just get it in the next night’s backup.” There’s not a fire drill to get after it. Do you see similar situations?
Jon: Yeah. And it depends on the business, right? Some businesses you see that unstructured is actually the lifeblood. They’re valuing that data. However, that’s probably the exception, not the rule these days from the customers we’re talking to. So, we do see a lot of people just saying, “Yeah, the once per night’s fine.” And I’ve even heard horror stories of people just putting it on external hard drives and taking it home with them. We hear it all.
George: Jon, I think a great place where this really becomes more important or that we’re exposed, maybe is the right way to say it, is when we start talking about ransomware, because ransomware can strike any anytime, day or night, right?
Jon: Yeah, absolutely. We’re seeing the evolution of ransomware on the ground. The City of Baltimore is a fine example of this where the ransomware kind of sat dormant for a period of time and infected more and more and more machines until the cumulative effect of that was an all-at-once attack that was to this day still unrecoverable.
George: Yes, it was a very devastating attack. Yeah. I guess the news isn’t terrible because some of this stuff might get picked up in snapshots. You know, snapshot technology is clearly there. In a lot of cases, if you’re running on a filer of some type or a NAS, the snapshot capability is built-in. But I think the challenge there is while it does provide faster backup, it really is horrible from a search perspective because you use you have the file system there, but there’s no search or data classification or anything there to figure out what snapshot you should go get what data from, right?
Jon: Yeah, exactly. And that’s one of the challenges that we really founded Aparavi around: how do we add intelligence to data, so that in these type of attacks you have confidence that what you’re retrieving is exactly what you need?
George: And I look at it as sort of that old thing. If a tree falls in the woods and nobody hears it, did it really fall? If you back up your data and you can’t find it, did you really back it up?
Jon: Did you really back it up? That’s right.
How Important is Unstructured Data?
George: Then let’s talk about also the importance of unstructured data, because I think that’s critical also. If you look at it in just terms of raw capacity, at almost every data center I’m in nowadays, this dwarfs the unstructured data set. I mean, big databases could be in the 5 to 6 to 700 gigs, where the unstructured data set is measured in dozens, if not hundreds of terabytes.
Jon: Yeah, absolutely. And as more organizations are bringing in more applications that are purpose-built, all of those are just building more and more unstructured data, and you’ve got to do something with it, right? Regulations.
George: Yeah, exactly. So, the other thing that I think is interesting is we talk office productivity applications, because we’re going to talk about machine data here in a second but I don’t want this to get overlooked. To me, this is like the creative energy, right? I mean, this is what a lot of people, their creative work is making a PowerPoint presentation or putting together a proposal in Microsoft Word, or Excel, or whatever tool you might happen to use. And this data is incredibly hard to go and recreate. And I think, honestly, one of the proof points here is to look at all the people that will willingly pay ransom to get this stuff unlocked so, obviously, by definition you’re saying there’s value here, right?
Jon: That’s exactly right. And we’ve seen a trend of more and more people trying to rely on paying the ransom.
George: Yeah, right.
Jon: We’ve seen that, and that’s why we’ve also seen state government being attacked, because they set a precedent for, “Oh, by the way, these guys pay the ransom, right?” And so I think that part of that is they’re calculating the hours and upon hours it takes and the iterations and the versions that come along with these type of creative files and how many man-hours that is. It can be staggering.
George: Now, I think the other thing to talk about here is you can reproduce this. In theory, if we lost this presentation, we could go back and put it back together again.
George: But it’s going to take time.
Jon: I’d rather not.
George: Or there’s a bullet you’ll miss or something, right? So, I think that’s a big challenge. And I think, of course, the other big one that is more of a headline probably over the last five to six years is data from machines, like logging data, and now IoT devices, and all that kind of stuff. This is just a huge percentage of that huge percentage.
Jon: Yeah, we have customers in manufacturing engineering where their machines are all talking and all producing data, and the downtime associated with that, or perhaps manufacturing defects, and there’s audits involved, especially in aerospace…that data people don’t understand the value of just yet. But they’re keeping all of it because they know it’s going to be valuable. And so, you see this sheer mass amount of data being created. The challenges with the traditional approaches is how do you find it?
George: Well, I think the other challenge that we see here is this is just as susceptible to a ransomware attack as anything else would be, right?
George: So, the other thing I think that’s really important to point out here is in many cases, like especially with IoT devices, that are capturing the measurement of something at a specific point in time, in certain, say, weather conditions, you can’t recreate that. That was a situation that just occurred.
Jon: There is no version.
George: Exactly. It’s gone. So, I think that becomes a real challenge as well.
Unstructured Data Is Hard to Protect
George: Let’s also talk about protecting this data and why you see the once-a-night type of strategies. Again, this is a big chunk of data.
George: And it’s not like a database that isn’t exactly but is essentially one file. A database isn’t spread across millions and billions of files, which is really what we have here. I remember when I started in the early ’90s, late ’80s. We would try to make sure we found servers that had 100,000 files on it. And now, we’re at a million and now, I’m sure you’re starting to talk to people that have billions of files.
Jon: Yeah, we absolutely are. I mean, billions of files is a conversation we have on a weekly basis. Someone comes in, and whether it’s a marketing firm that has just tons of iterations of creative files, or manufacturing, engineering, finance, all of these industries are aggregating data at a pace that when George started was unfathomable. When I started, I…you know.
George: Wow. That kind of hurt. All right. So, anyways, the other thing I think that’s really important to point out here that didn’t touch on the last slide was that the value of this data is actually inside the file. It’s the data that’s in that file, not necessarily the fact that the file exists, right?
Jon: That’s right. Yeah.
George: So, finding that really becomes critical.
Jon: Yeah, finding titles is one thing, but finding content and then being able to apply value to that content and context is a whole other thing.
George: Right. Well, for example, like the name of the file for this PowerPoint presentation is unstructured data, but there’s all kinds of other content in here that wouldn’t just…if I saw the file name might not match up to what I was looking for. So, that’s really critical as well. Now, let’s also spend some time here. I think the regulation ebb and flow, depending on what’s going on in the news, kind of like ransomware, I guess. But the regulations, like the California… is it California Consumer Protection Act?
George: Privacy Act. And then, GDPR. Those have really kind of changed the game as far as how we do this protection as well/
Jon: Yeah, absolutely. And this is a bit of a challenge when you have legislators designating rules that affect IT. They don’t necessarily know the impact it’s going to have. And so CCPA adopted a lot of the rules around right to be forgotten for GDPR. And common response is, “Well, we’ll delete it upon restore.” There’s challenges with that, obviously.
George: Yeah. Well, number one, I don’t know if that actually holds up to the law, right? I think the law is pretty clear that it has to be removed everywhere and in both regulations, they don’t specify a separate condition, if you will, for backup data.
Jon: That’s right. Yeah. And there’s teeth to this. We’ve already seen a genomics company in Canada get hit with a giant fine for not doing GDPR compliance. So, this is happening. More and more states, New York’s adopting something very similar to CCPA. They’re all kind of looking at GDPR like, “Ooh, we want to protect that data, right? We want to have privacy around that data,” because that is important. It brings about a lot of challenges for the IT department.
George: Yeah, I think the other thing that’s important to think about here is that part of the thing is protection. And so, it’s also that these regulations are all specifying not only that you have to protect this data; they’re also starting to say that you have to recover in a certain period of time. And so the right to be forgotten is one that we talk about a lot just because it’s particularly challenging, but at a base level, if you’re protecting data once a night, that might not be hitting these regulations.
Jon: Yeah, potentially.
Unstructured Data Protection Is An Afterthought
George: So, then, let’s talk about where we are today as a state of the art in protection. And what I tend to see is two approaches, the legacy, what I’ll call the applications, the data protection applications that were invented in the ’90s and early 2000s. A lot of the unstructured data was a file-by-file backup. Those files were stored because in that era, especially we were dealing mostly with tape, so they were stored. You didn’t write one file to a tape. That would be horrific. From a performance standpoint, you’ve kind of aggregated them and wrote them as big blobs, right? So, now finding that little file within that blob becomes a bit of a challenge.
And, of course, all of this leads to a situation where these can be slow. Again, they’re doing a lot of work. There’s a lot of nuances here about how you go get that data and walk file systems and things like that. But I’ve seen situations where just finding what data to back up can take longer than actually backing the data up.
Jon: That’s right.
George: So then, the other challenge that you start to have here, especially as we start to deal with millions and potentially billions of files, is metadata. Data about data. And this is all at a basic level. It’s date modified, date accessed, things like that, things that you need to be able to have, that tracking of all that data. And then in a backup environment, you need to also then have metadata that will tell you what version of that file is on what piece of media or on what device and all this sort of stuff. So this metadata really can start to explode too, right, Jon?
Jon: Absolutely. Yeah. That’s a big challenge. Again, that challenge affects the discovery of data and the speed at which you find that data, the speed at which your jobs are being run on a nightly basis. Like you were saying, we’ve seen customers of competing products where the product is sitting and thinking for six hours before it’s even backing something up.
George: Yeah, I think the other challenge that you see with metadata is it also impacts your ability to retain the core data itself, because that metadata index set gets so large that you can’t keep it all, and so you actually got to purge…you got to actually lose some detail to be able to keep the older data. So, it really becomes problematic.
Now, the other option, of course, and we see this a lot with modern applications, is they back up by doing essentially an image backup. So we just basically ignore all the files, and now we’re just looking for changed blocks. Now, the good news is: this is really fast. I mean, if it’s just a few blocks of change, it could happen in seconds. And I make this kind of almost funny, but there’s no metadata problems because there isn’t any metadata.
Jon: There isn’t any metadata. Yeah, that’s right. It doesn’t exist.
George: But that, of course, then means you have the same problem as being able to find it. Now, in fairness, most image-based solutions can go in find a single file and restore a single file. But you do have to know what job has that file.
Jon: Yeah, absolutely. You got to know where that was or perhaps, you might even have to mount the point where you think it is. You have to actually mount that as a virtual machine, as a drive, and then start to drag through file trees to find what you need if you don’t know something about it, like the title.
George: Exactly, yeah. So, all these metadata, lack of metadata, and things like that also leads to a situation where you lose that granularity. And you just can’t extract value from your backup set. The unstructured data thing—and this is where I think you kinda become guilty of treating unstructured data, like a second-class citizen—because you have no details, right?
George: You just don’t know. You’re just happy it got backed up.
Jon: Yeah. And that’s that broad approach of, “Well, let’s just grab everything,” is the answer to that. And that’s the image-based application. You’re favoring speed, justifiably so, for granularity. You can’t have both, right? You can’t have your cake and eat it too. So, there needs to be a balancing act between speed and granularity. Absolutely.
George: Well, and then turning back to, for example, GPDR, how do you in this situation give me all of John Smith’s data? If John Smith wants to be removed, you would have to know which backup job John Smith’s stuff was backed up on and then go remove it. And then how do you remove something from a middle of an image file and have that image still have any… I mean, essentially you’d corrupt…
Jon: You’d corrupt the entire image.
George: So it becomes a house of cards. It really becomes a challenge, I think, in this environment.
Organizations Need a Solution Purpose-Built for Unstructured Data Protection
George: So let’s talk about what you really need to be looking for. And I think there’s a couple of things here. I think it’s time to start looking at—especially in unstructured data—a concept of more of a purpose-built solution. The wave of unstructured data just happened too fast for the industry. And just like we sort of saw purpose-built backup solutions for VMware and things like that, I think it’s time for a purpose-built solution for unstructured data. And so, Jon, as you said, we’ve got to get that balance of fast backups but still get the granularity. So, sort of a snapshot, like a data capture approach, I think is really good there. And that allows us to really leverage that granularity and even go so far as providing content search.
Jon: Yeah, for example, we build a content and context-aware index on-prem to solve this very problem. So, data can be searched against, discovered. It can be retrieved in granular instances for us to provide that level of recoverability for our customers.
George: Yeah. And I think that becomes important as you start to service restores from this, and we talked about the horrific things like ransomware and things like that, but even just straight restores. Most of the time in my days when a user would come up and say they needed a file, they didn’t know the file name, right? I literally had a request for, “I need a version…” They knew the file name but they needed a version like two versions ago. Well, I don’t know where that is!
Jon: And the boots on the ground in the audience, like you guys, understand that. At the C-suite, they don’t understand. “What do you mean they don’t know the name of the file?” But we all have heard of our internal customers requesting, “Hey, I don’t know where I put this. I don’t know the name of the file. How do I find it? Where is it? I must have deleted it. Where did it go? I need it.”
George: Well, a lot of times I’ve had people ask for files to be restored, and they actually weren’t deleted. I’m restoring it because they don’t even know where it is.
Jon: They don’t know where it is, yeah. They just ask for another copy of it.
George: Yeah, exactly. So, and I think the other thing is we want to leverage some modern storage technologies, right?
George: So, an on-premise object store or even public cloud storage for long-term data retention.
Jon: Yeah, absolutely. Aparavi is storage agnostic. We’re a software company. You don’t have to store your data with me. Well, you can’t store your data with me. So, we have customers that have some data that stays on-prem. Other data ends up going out to cloud object storage, whereas other data might go to a secondary location. So, our ethos is that not all data is created equal. Different data has different types of retention windows, different types of retention periods, different levels of sensitivity. We want to help you discover that and then give you the tools to move it to the right location.
George: Yeah. And I think, also, it’s important to realize that that data changes over time. Today, it’s very valuable. In three or four years, not so much. And I want to be able to move its location granularly, because you into that same problem with the image backups. A lot of people are talking about archiving to the cloud. Well, I don’t know how you can do that if it’s all built together. You need to be able to extract certain components.
Jon: Yeah, copying images out to the cloud and calling it an archive just isn’t sustainable with the rate of unstructured data growth that we’re seeing. I think the latest forecast is by 2025, 90% of all data anywhere is forecasted to be unstructured. So, that’s not a sustainable path forward, especially if you’re paying for a flash array from someone.
George: That makes total sense. So, the other thing is, and I think this comes up a lot in ransomware recovery, as an example. We talked about instant recovery in VMware data protection, provide that same sort of functionality. And it seems to me that the… I don’t wanna say it’s easy because I don’t want to insult developers, but it seems obvious, anyways, that you would be able to put an SMB or an NFS mount together that somebody could access and start pulling data off of immediately.
Jon: Yeah. And I think historically, the challenge with that has been is that mount point might not know where the data is. It could be on-prem. It could be in the cloud. And Aparavi has actually built a solution for that because we’re able to create that mount point from our on-prem index without even touching the storage. So, no egress fees, no get and puts, none of that. No latency. You’re not saturating bandwidth. You’re also not having to recall or mount entire backups. You’re actually able to mount and provide a network path to our index that can be searched against. You can find content. You can find the context of where certain search terms show up in the document. So, yeah, a lot of flexibility in what we’ve built here.
George: So, Jon, we’ve talked a little bit about the requirements here, and we’ve tied some of that into what you guys do. Let’s jump into detail on Aparavi and what you guys do.
How Aparavi Treats Your Unstructured Data
Jon: Yeah, absolutely. So, Aparavi, we kind of fancy ourselves as a multi-cloud intelligent data protection platform. And what this is, and when we were ideating about what we were creating, like I said earlier, we came from the backup industry. This is this team’s fourth or fifth, I lost count, data protection, data management engine. And we didn’t want to repeat the sins of the past, which were looking at what’s the problem now, not what’s the problem tomorrow. So, when we were ideating this, we kept thinking of a junk drawer in a house. This is actually our CMO’s drawer. This is what it looks like. I think if you look closely, there’s like an Oreo cookie or something in there.
George: There is. It’s right there.
Jon: Yeah. So, it’s pretty interesting. There’s obviously no organization. There’s no rhyme or reason to this. It’s just the drawer you throw your stuff into. And that’s what we saw image backups is becoming is this silo or multiple silos where you really didn’t know what you had in there.
George: Right. Yeah. You just knew it was in the drawer somewhere.
Jon: Yeah, exactly. So, Aparavi’s goal is to Marie Kondo, trademark Netflix, your data. And so, here you can see you’ve got nice, neat organizations. Your batteries are in the right place. Your cutting implements are in the right place. Your glues, your tapes, everything is nicely put. Now, you’ve got one storage device here, the drawer, and different types of data in there, but it’s organized by type. And that’s what Aparavi sought to do with some new technology.
So, really, what Aparavi is, is we’re a file by file data protection tool. We place the value on the data itself. And the way we do that, whether it be a server, whether it be an endpoint, whether it’s a storage device itself, is by creating an index on-prem and actually being able to look through that index, look for patterns, like Social Security numbers, credit card numbers. You can create customizable taxonomy. So, if you know that there’s a specific file type or file name that’s important to your business, we can actually discover that through all of these different data sources, and then tag it with that tag, right, whether it’s PII or whether its PHI, whatever it may be.
George: So, Jon, now, we did talk about some of the potential slowness in file by file. I’m assuming you guys have overcome a lot of those challenges.
Jon: Yeah, technology’s come a long way. There’s some proprietary technology that we have that allows us to build this index quickly and only have to build it once. So, every time we’re doing a job, we don’t have to scan through everything.
George: So, the subsequent backups especially go really fast.
Jon: Yeah, there’s obviously going to be a moment of pain in the beginning. The first time you input Aparavi and we’ve got a chunk through a petabyte of data in billions of files, yeah, that could take some energy. But, after that, it’s very light. Very, very light. Absolutely.
I think one of the other things that we really thought of was the changing landscape of ransomware, and having additional layers of protection for ransomware. So, we built in a solution that allows you not only to value the data, so you can discover in a ransomware scenario what’s most important, but also, as we talked about a minute ago, that index that can be created allows users to search and drag and drop the files they need to continue doing business. Oftentimes, what we see in ransomware attacks is the cost of downtime is far greater than any cost of paying a fine. But organizations obviously are being instructed not to pay those fines, and the likelihood of getting your data back even if you do? Pretty low.
George: So, can I use that index to help detect that I’m starting to experience an attack, like detect a lot of changes or stuff like that?
Jon: Yeah. And actually, that’s kind of a good segue into some of my next slides.
George: That’s what I’m here for.
Recover and Retrieve with Confidence and Context
Jon: Thank you, George. So, Aparavi actually has an automated alert and defense system. Just like any other backup software, we’re going to look at changes. We’re going to have a difference between job A and job B. So, what Aparavi is able to do is actually calculate that percentage change between those two jobs before we’ve moved any data. This way, you can set a threshold that says, “Hey, if 10% of my data changes in this specific folder,” or on this machine, or 30%, or 1%, it’s a customizable threshold, “Send me an email. Send me a text message.” But more importantly, we won’t do any storage operations until the user intervenes.
George: So, in other words, you’re not corrupting the backup based on that threshold.
Jon: Exactly. And in that scenario, I mean, I can’t tell you how many times working for old backup vendors I’d get calls from a customer saying, “Hey, your backup software overwrote a file with encrypted data. Why did it do that?” I was like, “Oh, well, either that file was deleted on-prem or it was out of its retention window. And the software saw a new change.” This prevents all of that from happening. It is, in essence, a tool to help data loss prevention in that respect.
George: The air gap copies is kind of interesting. Talk about what you’re doing there, because, obviously, that’s important in a ransomware attack too.
Jon: Yeah, and air-gapped in cloud, some people will fight on the hill that it’s not truly air-gapped. But to date, ransomware isn’t able to travel through an S3 API. It’s not able to go through that. It can go through network passes, we know. So, what Aparavi can do is we can send a copy of your data to Amazon. But also, you can send a copy of that same data to Google or to Wasabi or to Backblaze or to any of our providers. And we even support the generic S3 API. So, if you have storage on-prem, and you’re using something like a MinIO, for example, to emulate an S3 API, we support it. So, we’re really storage agnostic there. But by having those multiple copies, you’re only increasing your insurance plan against ransomware or against even bad actors inside of the environment.
George: Right. Okay. It makes sense. Now, obviously, we were talking about instant recovery. You guys now have the ability to essentially provide a mount to the customer. Talk about that a little bit.
Jon: Yeah. So, as we’re doing our backups, we’re actually creating that index. That index is going to live on-prem inside of our architecture, which I’ll show you here in a sec. But it also is copied with the data itself for fault tolerance. So, if you lose our software on-prem or your machine on-prem that’s running our software, you can quickly retrieve that. But what that index allows you to do is it allows you to search at a content level through all of your unstructured data, no matter where it is. So, you can search through our UI, or as George was hinting at a second ago, you can create that mount point, where you can go in, pick a specific point in time, create that mount, and then give access to the users whether this is temporarily in a ransomware recovery scenario, or whether this is just the preferred method that the CEO likes to recover his data from, by browsing through the Windows file tree structure and dragging and dropping. The beauty of this is, again, it doesn’t touch storage until you select and actually retrieve the file. Before, all of the intelligence, all of the search you’re doing is all happening as the index. So, it happens very fast.
George: I think what’s interesting from a recovery standpoint, then, also is, even though you’ve given me the ability here to keep data in multiple places, from a search perspective, I’m just searching on one thing, not everything.
Jon: Yeah, you can think of it almost as like Aparavi is a way to kind of democratize cloud storage. So, this can give you a multi-vendor strategy for multiple different reasons. That’s obviously beneficial. But, yeah, as far as Aparavi’s concerned, you don’t need to know where that data is. And this is incredibly important when you’re thinking about 7, 10, infinite years retention windows because of compliance. You might have a new CEO that comes in and says, “Hey, we’re Google now,” or, “Hey, we’re Azure now.” And these things change over time. Or perhaps there’s a new entrant that you wanna take advantage of. Aparavi can allow you to do that without having to re-index the data and then having one pane of glass to search all of those clouds, and the software is going to know based on the index where that data is.
George: Well, and just to kind of drill down on that, the way I understand it is like with a lot of products, if I wanna move from let’s say Amazon to Google or Amazon to Wasabi, if I want to keep my history, and history is critical in backups, I need to pull the whole thing down, which is going to obviously take time and a massive egress fee.
Jon: And you’ve got to have space for it on-prem.
George: Yes, that’s a good point. I didn’t think of that. And then you got to move it up to your other cloud.
Jon: Forklift it.
George: I think what I’m hearing you say, as I could say, “Okay, I’m just going to stop. And all new backups or new data is going to go to whoever my new provider is. And I’ll just let this age out naturally and thereby not incur the egress fees.”
Jon: Yeah, exactly. And not only that, we can even have versions of files at a sub-filing or a sub-block increment, a four kilobyte bit of the change, be in a new cloud. So, you could have your original file in Amazon. And then you could have an increment in Google or an increment in Azure, or an increment in Wasabi, what have you. And we support all that.
George: And the software knows to pull all those increments together at the right time.
Jon: Yeah, based on the recovery point you choose, the software is going to know how to grab that file. But, then, as you mentioned, as time goes on, the data in that old cloud is going to shrink and shrink and shrink until it’s the bare minimum of that base file likely. And then at that point, you can do that forklift migration. We have that path where you can say, “Okay, 90% of my data is now in my new cloud. I’m going to take the other 10%, I’ll pay whatever egress fee is associated with that,” if you want, or you can leave it there. It’s up to you.
George: Now, we also talked a lot about compliance and regulation. Talk a little bit about what you guys are doing with insight and governance.
Jon: Yeah, absolutely. So, when you look at a lot of the insight and governance tools out there, they’re going to be looking at the file server itself, and not necessarily down to the endpoint level. We’ll provide that extra level of granularity down to the individual endpoint itself. What we found is we were talking with a large insurance company, a publicly-traded insurance company, and they had every governance tool in the world out there looking through their data, finding any type of PII that was being aggregated, and there was a lot of it. However, what it was missing was users actually saving customer data inside like notes, for example, on their laptop. And so, Aparavi gave them that added layer of security while also even going against their file servers and providing validation that their other governance tools are grabbing everything.
So, they took this very, very seriously. Obviously, they’re in a highly litigious industry. But because we know the content of the data, and we know where that data is, based on where we’ve backed it up from, we’re able to then give a higher level of confidence in a governance or audit type scenario that you’re able to comply. We talked quite a bit about GDPR, and CCPA earlier, and the other states that are adopting these type of rules. Well, Aparavi, you can search by username and by Social Security number and by other identifying factors of an individual to help discover that data. And because we’ve placed it file by file, you then have a path to actually knowing where that data is and being able to remove it at a subfile granularity.
George: Wow. Okay. We didn’t talk a lot about the immutable copies. Let’s wrap this section up with that.
Jon: Yeah. Anything that’s changed becomes a new version. So, in essence, it’s like a software worm. So, we have customers in the finance industry that are using us for SEC compliance, which requires everything be in a worm format or an immutable format. And so, they’re using our software, as well as immutable storage in Azure, the write once, read many storage out there in Azure, to provide that added layer of competence, but you can’t go in and edit anything that’s up and that’s been backed up by Aparavi. Anything that’s changed will then become a new version.
George: Okay. So, Jon, let’s talk about what this looks like architecturally.
Aparavi File Protect & Insight
Jon: Yeah, absolutely. So, it’s a really simple architecture. This is a really simple slide for it. Obviously, depending on the infrastructure, this could look a little bit different. But we’ve got our software appliance here, and this is what’s going to live either on-prem or near the data. So, if you’re aggregating unstructured data in EC2, for example, you can place our software appliance in an EC2 bucket, in a container. However you want to deliver it, it can be delivered. It can live on physical and virtual machines, Windows or Linux. And it really is the aggregator and the enforcer of policies. So, it’s going to aggregate any policy that’s been set forth from our web platform.
So, the UI is only online. You can actually create stuff online, have a policy, and then have that policy then be enforced downstream throughout the entire ecosystem. The software can run agentlessly to SANS, NAS, file servers, etc. But in the case of remote office, branch office user, individual laptops, or if you want some added layers of granularity on your file servers, we have agents that can run out there as well. But those agents will adopt any policy set forth from above. So, as policies change, and as new governance comes in, an admin can go and apply a group policy and have that get disseminated throughout the entire ecosystem really quickly. You don’t have to go point to point to point to make these changes.
George: Okay. And then as far as I’ll call them clients that you support out here, basically, any NFS, SMB mount type of thing?
Jon: Yeah, absolutely any mount is fine. We can grab that. Absolutely. The data itself doesn’t have to go to cloud either. You can define a path there as well, so, if you guys are not ready for the cloud, and you’ve got leadership that isn’t ready to start putting all your data in the cloud, you can choose some data just as a proof to say, “Hey, this is safe. Look, check it out,” whereas your more critical data that is maybe higher sensitivity stays on-prem or it goes to a colocation facility, something along those lines.
George: You also mentioned endpoints like laptops and desktops. Do you protect those as well?
Jon: Yeah, we can. So, even if they’re not connected, what we’ll do is we’ll actually cache anything that’s happening locally. And as soon as that connection, whether it’s a Wi-Fi, or they’re on a VPN, or what have you, once that connection is sensed, it’ll then push everything up, depending on your policy. So, what’s nice about Aparavi is we actually can give you a couple added layers of protection. So, the software appliance on-prem, if you want, it can actually hold versions of your data. It doesn’t have to. At minimum, it can only hold that index, the dictionary that we talked about, which is maybe a 10% footprint of your total data, whereas the data will get sent directly from the laptop to cloud or to the final storage destination. But it can hold versions on-prem. So, you can say, “Hey, I want five recovery points today. And then, at the end of the night, I want to push it out to cloud.”
George: So that way you keep your physical footprint on-prem at a minimum.
Jon: Yeah, we’re not just exploding secondary storage. Absolutely.
George: And then from a cloud, we kind of touched on this a lot, but basically, any cloud provider, the big boys plus any S3 compatible cloud provider.
Jon: That’s right. Yeah. And we have a list of certified cloud providers on our website that we’ve done interoperability testing with and we’ve got some partnerships with, but, again, Aparavi is storage agnostic. I don’t make any money by talking about Azure or Google or anyone like that. That’s up to you guys. You have your negotiated agreements with those cloud providers. We’re not going to step in the way of that or add any type of premium or say that you have to go to this cloud provider.
What Problems Aparavi FPI Can Solve
Jon: So, we’ve talked about a lot of problems that we can help you solve. But ransomware recovery is absolutely one of the key ones, having a secondary line of defense in place. I don’t want to replace your Veeam. I don’t wanna replace your Zerto. I’m not in the business of saying, “We’re disaster recovery. We’re business continuity,” because we’re not. And those tools shouldn’t be in the business of saying, “We’re your archive. We’re your long-term data retention. We’re your compliance and governance.” They’re different tasks.
George: Well, and that goes to that purpose-built idea, right?
Jon: A hundred percent.
George: I know in conversations with you, you have more than a few customers that have come to you specifically for the ransomware recovery capabilities, right?
Jon: Yeah, absolutely. Part of that is they want a different strategy for backup and they want a multi-vendor strategy for backup. Some of these cyber attackers are getting pretty savvy to APIs on other backup software, and adding another layer of protection only helps. We license it based on storage at that point. So, it’s actually very cost-effective to be used as an additional line of defense. But yeah, we get customers all the time that have more or less their servers from their existing backup software have gotten encrypted also with us because they had not the greatest of policies in the world or because there was a back door that a cyber attacker exploited.
George: And we talked about endpoint, and we didn’t really touch on remote office, branch office protection, but same basic idea.
Jon: Same basic concept. Exactly. You can run agents out there. The platform is multi-tier and multi-tenant. So, if any of you guys out there are service providers, this is a phenomenal tool to roll out to your customers, both as an extra insurance policy to you but also it adds some additional recovery outcomes for your clients and the data intelligence. But even if you’re an organization with multiple business units, you can provision account access where accounting only sees accounting data, and marketing sees marketing data. But the super admin will obviously be able to see everything in the environment.
George: Wwe talked quite a bit about retention. I don’t know if we spent a lot of time on archive. Why don’t you talk a little bit about that?
Jon: Yeah, our policies allow you to archive off data that we’ve created throughout the backups when it’s appropriate and to the appropriate storage tier. Again, that way so your secondary storage isn’t just getting blown up. So, that versioning that we’re doing on-prem, you can define how many of those you keep there, as well as when you send that data out to archive to help reduce that on-premises storage.
George: And I think, again, this is where that granularity comes in, because you can archive specific content as opposed to the entire backup job.
Jon: Yeah, absolutely, because, again, we’re placing the value on the data, whether you’re using the classification tagging and the pattern matching that we offer to discover that data, and then assigning out policy versus generalized data that maybe doesn’t need to be kept for 10 years.
George: It makes sense. And then we’ve talked a lot about compliance and governance. Talk a little bit about storage optimization.
Jon: Yeah. Because of the unique way that we’re storing data in object storage, it gives us granular control over that. So, we can actually remove data the moment its policy retention expires. So, you’re not held to when the backup was. We’re looking actually at the data of that file. So, we’re able to remove the data. Even individual bits and blocks of data can be removed. You know, against like a traditional grandfather, father-son style backup, or a forever incremental that requires you to rebase, we’re going to save, over 10 years, we’ve modeled out up to 75% on secondary storage.
George: Yeah, what you typically see in backup applications is the entire blob that makes up that job has to…everything has to hit those retention requirements before it can be removed because the whole thing is built on it. And I’m a little disappointed in the last thing here that you guys can’t help solve the mystery of Stonehenge.
Jon: Well I think it’s universally understood that it’s aliens.
George: Oh, okay. So, for contact information, if you have questions, the best way to ask questions is to tweet us questions, which is right there @storageswiss or @aparavisoftware. Contact information is also up here. So, feel free to reach out to us any way you can. I know, Jon, that your guys’ website has a lot of information on this as well, right?
Jon: Yeah, absolutely. We’ve got dedicated pages for these type of solutions that we just talked about. So, go check us out at aparavi.com, browse around, and we don’t gate any of our materials on the resources page. So, if you’re just doing research go ahead, grab some documents, read up on it, and we’d love to hear from you.
George: Great. Well, Jon, thanks for joining us today.
Jon: Yeah, absolutely. Thanks, George.
George: You’re welcome.
Jon: Thanks, everybody.
George: All right. And there you have it. So, if you’re looking for unstructured data protection, at Storage Switzerland, we really think it’s time to really start thinking this as a purpose-built activity. The other solutions are really good for hitting RPOs and RTOs in applications and environments and virtualized environments. But unstructured data is just fundamentally different. And so, having the ability to have a tool that’s very focused on that is going to not only bring you peace of mind, but also bring a lot of value to the organization.
Jon: Yeah, that’s true.
George: Thank you for joining us. I’m George Crump, Lead Analyst with Storage Switzerland.