Truth in IT has shared a new video on how to “Overcome the Unstructured Data Tsunami” which is available here for your viewing pleasure.
In this video, Truth in IT’s managing editor David Littman hosts IT Analyst Mike Matchett of Small World Big Data and our VP of Business Development Jonathan Calmes in a conversation about the challenges of unstructured data management – capacity growth, protection, accessibility, especially in environments with both cloud and on-premises storage.
Truth in IT – “because video is the new whitepaper” – features unbiased informational and educational tech content in a digestible and dare we say entertaining format. We appreciate the chance to participate and hope you find it valuable.
Transcript: The Challenges of Unstructured Data Management
Dave Littman: Hi. Dave Littman, Truth in IT. Welcome to today’s webcast, “Overcoming the Tsunami of Unstructured Data.” In just a second, I’m going to be bringing on Jon Calmes, who is VP of Business Development with Aparavi, as well as Mike Matchett, who is Senior Research Analyst and CEO of Small World Big Data. Before I bring those guys on, a couple of housekeeping tips: we expect today’s webcast to go about 30 minutes. There is a Q&A panel, so just enter your questions and we’ll get to those toward the end of the broadcast. As you know, we are also doing an Amazon giveaway. We’re not going to disrupt the conversation to announce it; we’re just going to display the names across the bottom of the screen, so just keep your eyes open. We’ll put out reminders and alerts and all that so you won’t miss it. But you can’t be listening, you gotta be watching. So, without further ado, let me see if Mike and Jon are here. Guys, are you here?
Mike Matchett: Yeah, hi, Dave.
Jon Calmes: Yep.
Dave: All right, excellent. So, Mike, why don’t we kick things over to you and then we’ll go to Jon, and we’ll get right to it.
Mike: All right. Thanks, Dave. Yeah, this is Mike Matchett. We have seen a lot happening in the data protection space recently. Backup is dead, backup is not dead, backup is everything, backup is nothing. We have got new technologies coming out, we’ve got faster storage, we’ve got people saying, “Hey, I’m moving my datasets into a hybrid environment or multicloud environment. I’ve got on-premises data, how do I back all that up?” Backup windows are growing, and of course it’s not really about backup, it’s about recovery. And then people are saying, “Look, if I’ve got all this data and I want to make a backup of it, and protect it, I also want to make use of it. I want to get some value out of it, whether it’s an active archive or whatever.” Those use cases are also converging into this. So, it was with great pleasure that we started to talk with a company called Aparavi. So, we got them here today and we’re going to listen to a little presentation. Welcome, Jon.
Jon: Thank you, Mike. I appreciate the warm intro there. My first slide presentation that I’ve got is a big thing in the name. And people always ask me, “What does Aparavi mean? Where does it come from?” So, Aparavi is a form of the Latin word aparare, which actually means “To prepare, make ready, or equip.” So, all of our leadership came out of legacy backup companies. Rod Christiansen, our CTO, this is his fifth data protection engine between Yosemite Tapeware, CAs, Arcserve, DDD product, a joint venture over at Novastor. We saw a pretty big problem with unstructured data management, especially in the new world of hybrid and multi-cloud, and our name actually means to prepare, make ready, and equip. And that’s what we believe we’re here to do, is prepare enterprises and even small businesses, MSPs, with how to manage their data over a changing lifecycle in the cloud or even on-prem.
Mike: It’s one thing to have a product that does a small slice of this pie, but people are getting to the point now where storage is moving underneath, clouds are moving on top, hosting is changing. And one thing has to stay constant, which is the data needs to be protected. So, fitting right in between that, almost providing this insulating layer between those, sounds like where you guys are really going to add some value.
Jon: Yeah, absolutely. I think that most of our customers are using us in a hybrid cloud environment where they’re dealing with multiple different types of storage on-prem, multiple different cloud providers, and looking to have something that can seamlessly go across all those different sites, locations, storage devices, etc.
Mike: So, let’s just start at the real motivating factor here. It’s about unstructured data, right? We’ve solved the problem in structured data, and what we know about data growth, and the data tsunami, and big data, is it’s really unstructured, and it’s coming in huge volumes. So tell us a little bit about that.
Jon: Yeah, exactly. And I’ve got a slide showing right now that is the challenge of unstructured data management. The reality is, in the past, structured data made up a lot of organizations’ important information, and a lot of the backup engines were kind of built around that structured data format. But what we see now is that unstructured data is growing at a massive growth rate. I think, on average, the latest statistic I saw was 60% of data growth per year is attributed to unstructured data, 90% of all data by 2023 is forecasted to be unstructured. And I think the bigger issue is it’s not necessarily that, “Oh, hey, I’ve got 50 terabytes, 100 terabytes, or a petabyte of data,” whatever it might be. It’s the fact that it’s millions upon millions of these discrete individual files that have to be cataloged, indexed, and put somewhere. And they need to be accessible.
We did a study last year at the end of the year around long term data retention drivers. And one of the big reasons why we changed our branding from long term data retention to active archive is because enterprises were reporting that 85% of their archived data was being accessed as much as bi-weekly, which is insane to me. I was like, “I don’t believe that for a minute.” But, what we found out is that the data growth was flooding their primary storage, it was flooding their secondary storage, and their most effective way to get it out to cloud was to archive it, or off to some cheaper storage was to archive it, but yet the data that was in that was created from the last two years. It’s relatively recent data. And that was kind of an eye-opening moment for us where we were like, okay, we really need to focus on the availability of data, wherever it may be in its lifecycle, if it’s on primary, secondary, tertiary storage, and provide information on that data where it sits as well as the capability to retrieve it in general.
Mike: I’m thinking of this curve of accessing data, where you’re accessing primary hot data here, and as it ages we access it a little bit less and less. But as that data growth gets huge, we’d had to move our archiving bar up much farther, so that not only are we moving into the realm of archiving data that’s still got a lot of active life to it, there’s more of it as the rising tide lifted that. So we’re actually kind of compounding the problem in archive. And that’s to me just a fascinating area of where storage has to go. So, where do you go next? What did you guys actually decide to do to make this active archive?
Jon: Really, it started with identifying some of the foundational issues with existing solutions. And what we found with a lot of the existing engines out there, and a lot of enterprises we talked to were saying, “Yeah, well, my backup software, that’s just how I do long term data retention. I start with my backup software, and it’s got some archiving features.” And you’re like, “Okay how’s that working for you?” And that’s when you started getting the honest responses of, “Well, it really isn’t. We’ve got massive issues with backup windows, restore times. We have no idea what we have in our secondary and tertiary storage.” So we set out to address some of these problems. We wanted to build an engine that wasn’t built 15 years ago around structured data, or even 5, 10 years ago.
Look at what we refer to today as modern backup, and these engines were still built almost a decade ago, right? And they’re built around structured data, or they’re built around proprietary image-based formats. And that was the de facto response to large file sets, like, okay, we’ll just take an image, right? And the problem with that is those images are opaque, you can’t go into there, you can’t manipulate the data inside of it. So when new regulations come out, like GDPR, for example, some of the capabilities that GDPR demands are not possible with image-based type solutions.
Mike: There’s a granularity issue that comes up. Like you have to protect this vast volume of data, but it has to be addressable, indexable, or policies have to be able to apply to it at a granular level, as objects, full class objects.
Jon: Exactly. And that’s a very near impossible thing to do unless you’re gonna—I think a lot of the organizations that we’ve seen have been like, “Oh, we’re just gonna mount something. If we’re using S3 we’ll mount something to use to a point to it, we’ll pull out, track what we need, yada, yada, yada.” It’s just like, “Man, that’s cumbersome.” And with these engines, really, they’re just not cloud or storage optimized. The engines weren’t forecasting the data growth that we’re seeing, and so you end up with these pretty complex tools that are incredibly expensive. Aas they bolted on new technologies, maybe they’ve acquired a company and they’ve tried to kind of make their engines talk to each other, and it’s broken. That’s just the bottom line, it’s broken.
Mike: So, in doing this, and just to the granularity thing and getting this to work, you just sort of touched on the fact that there must be a way to—we touched on the open data format. So there must be a way to create a universal storage format for all this, and that’s what you guys start tackling with this open data format.
Jon: Yeah. Our open data format is designed around making sure that Aparavi isn’t the only solution that can retrieve your data. So we’ve actually documented our entire data format. We’ve placed a reader for that in public domain. It’s a proprietary format to Aparavi, however, we’ve documented it, we made it available and open. So, it’s not like you can go into cloud and just see your data. We’re not file sync and share, right? So it is going to remain in cloud, compressed, encrypted, obviously, but by documenting that, any third party with the right access, so you need to be able to have your key ring encryption that you’ve set up with Aparavi. You need to have obviously your direct access to your S3 bucket. And if you retrieve that file, let’s say with Aparavi but you want to read it with something else, the document data format would support that, absolutely.
Mike: This becomes a kind of a time machine for my files across a multi-cloud scenario, or wherever I have it, or hybrid scenario. And then whole ecosystem players jump in at some point when that develops and they can come in and do all sorts of extra tasks on top of that archive. So, it really does stay active.
Jon: Yeah, that’s exactly right. So, by nature we have a full RESTful API across the board. We’re partnering with different companies to provide analytics, things like that, to data that’s already sitting in cloud, so you don’t have to pull everything back down before you actually learn something from it. But again, it’s all predicated on you having the right access and encryption.
Mike: So, tell me a little bit about the architecture here, and what it is that you’re doing on-premises, you know, where’s the data flow, and where does it go?
Jon: Absolutely. So, on this slide here, we’ve got our active archive architecture. So, our architecture consists of three main pieces of software essentially. So, the platform is web hosted. Aparavi actually hosts that for all our clients, but some do ask to host it themselves and that is fully capable. And then on-premises, you’ll have two pieces of software from us. One is a software appliance. We wrestled with, “Do we do a hardware appliance? Do we make this appliance-based?” And ultimately we decided no, because what we realized is that our software was so efficient that it didn’t require proprietary hardware and expensive disk to run. It’s really network contingent.
So our architecture kind of treats the source as an edge. The source is doing quite a bit of the work. So we’ve got a transparent agent that’s there and it handles its own tasks, and then it even has some capabilities to cluster together with other sources to help compute. But, along with that three-tiered architecture, we do three different storage operations. And this is all based on how Aparavi you want to be. So, when we hop into a new infrastructure, the first thing that’ll happen is you’ll set up your software appliance, you’ll then invite your agents to it. We do like a Netflix-style check-in, where it gets a code, a specific code, and it’s like, “Okay, this is my platform, this is my appliance.” And it automatically inherits the policy that you’ve defined.
The very first thing we do is what we call a CDP checkpoint, should the user have configured those. But, if they have, that will be the first thing that’s done. In essence, what that is is it’s a file by file picture of everything that you’ve selected to back up or to retain. Now, what will happen, you can configure those to run every five minutes if you wanted to, every one minute. It’s not like as something changes, it’s like, “Ooh, something new.” It’s not constantly doing that; you have to tell it how frequently you want it to look for new and changed data. The next storage operation that happens is our snapshots. So, snapshots, simply enough, are a file by file picture of everything that you’ve selected. The main difference here is that snapshot is moved off of the source and it now lives on that software appliance storage. And, as we do that, we clean up behind ourselves.
So those CDP checkpoints end up being more like a temporary recovery cache, so we’re not constantly growing your primary disk. The checkpoints themselves can be stored on direct attached hardware, internal storage, or network-attached storage. You can define your path all through there if you like. And the same is true with that software appliance.
Mike: Just to be clear, we’re talking incremental snapshots of what’s being changed and needs to be protected?
Jon: Exactly, yeah. Let me correct myself. The first thing we do is a snapshot, so that snapshot will live on the appliance and then all those checkpoints then become sub-file increments of the change to the new data. And that’s true through the entire stock of storage operations. Once you have a snapshot up on the appliance, everything after that is sub-file, incremental-forever-style retention. So, the snapshots, like I said, you can configure those pretty frequently. They’re going to clean up any checkpoints that were done between those two points because we don’t want to duplicate data on that end. And then at that point, you’re going to define your archive solution, that’s when data is going to head off to the cloud, as soon as you’re archiving.
There is some cloud support if you want it to go that route, but we’re really telling people that you want to have the data on-prem for those oops moments, those quick recoveries, those quick retrievals where you need to get something back quick and you don’t want to have to go out to cloud to find it. But once you go out to cloud, we’re pretty much storage and vendor agnostic on that front. So you can create a generic S3 object store. You can create a dedicated S3 object store. We support Wasabi, Azure, Google Cloud, and that’s constantly expanding. But we actually don’t tell you that a generic S3 object store is non-supported. So, you might have something out there that’s presenting itself internally as S3 objects or like private cloud, because maybe public cloud doesn’t make sense for you, that might be too costly. And we absolutely support that as well. So, no issues there.
Archives work very similar as snapshots and checkpoints in that, once an archive is out to the cloud, you define how many versions you want to keep on-premises and then we’ll start cleaning up behind ourselves to those snapshots as well.
Mike: All right. So we got checkpoints, we’ve got snapshots, and you’ve got archives. When the user is going after something and we’re calling it, do they have to be aware of that architecture? What are they seeing?
Jon: Everything is done by point in time. So you’re able to select a point in time of when you need your data from, and the software is going to manage the best place to retrieve that from. So if it has it in a checkpoint, it knows that from the date and time, it’ll grab it from there really quickly. Same is true of snapshots and archives. The beauty of how we’ve built the software is you actually don’t necessarily need to know where it is, you just need to know when it was, and the software is going to handle that for you.
Mike: There’s a lot of times that I don’t know when I want something either, but that’s a different problem.
Jon: And that’s where some of our features come in on the next slide actually. So, one of the things that we do is, because we’re a dynamic and hybrid cloud storage company, we need to provide some capabilities of advanced archive search, content-based search. So, if you know that in the title of the name, for example, is, you know, “Matchett.” So I need to find everything that has my name attached to it. You type that in, it’s going to then present everything that has your name on it, and you can choose the version that you want.
So, our retrievals or recoveries–the difference being recovery is, you know, “Crap, I lost something.” Retrieval means, “Ooh, I need to learn from the past, and it’s going somewhere else.”–you can look through those through content, you can look through them via classifications. So if it’s finance data, etc., we have automated classifications. You can custom classify data, you can also look at metadata as well. So, a lot of different ways to find the data that you need to retrieve or recover.
Mike: This is where we start to get smart about it, and it being active as a usable archive and not just as a backup, you know, “Oh, I messed up the whole file system, give me that back.” Now I can go in and say, “No, I want all the files that mentioned X, Y, and Z that were written by this person in that time frame. Bring those back and subsequent versions.” So this definitely uses standard archive uses, perhaps, but it uses a new discovery, or forensics, or getting back data, all sorts of things. Rebuilding processes that got lost, or files, or recreating scenarios…
Jon: The way we’ve built the classification engine, the search, things like that, it’s deeper than you’re going to see any other data protection and archive solution go, but it’s not as deep as a dedicated e-discovery solution. So, it’s kind of e-discovery lite.
Mike: And we mentioned you’re going to have an ecosystem of people playing on the open data format, which is also here on the slide. So, that would be people with extreme vertical experience in different use cases can come to the table here and play fairly across it.
Jon: Exactly. And really, our goal with that, when we set out, that’s kind of like a secondary benefit that might be the primary now that we’ve been in the market for a little bit. But the goal around our open data format was vendor lock. We wanted to get rid of it, whether that was us, or whether that was in the cloud. It really was focused on how do we ensure that people are comfortable with having their data in a cloud location and with us as a vendor? And so we used our dynamic hybrid cloud storage as well as our open data format to end to that. And I’ll explain that for a second. So, by way of how we’re placing data in the cloud, we actually put each individual incremental file as its own object in cloud storage.
So, that gives us that sub-file granularity in the cloud to identify individual iterations of files. You might have a base file that doesn’t mention your name, but you have an increment that does, right? We can go in and provide that latest increment and actually remove it with our pruning algorithms, or we can take some action on it. For example, let’s say I’ve been working with Amazon S3 for the last five years with Aparavi. And, you know what? They’re just not getting on board with this multi-cloud world. They think multi-cloud is just different Amazon buckets in different regions, for example. And I’ve got a new CIO who used to come from Google, and he’s got all sorts of credits there, and so we’re going to move.
In a traditional archive and data retention role, that’s a nightmare. And you end up having to either bring data back down on-prem and then reclassify it, reindex it, and throw it back out to the new cloud, and you’ve got to pay these massive egress fees as of today. I imagine those will go away soon. But that headache is why people stay locked into their clouds. With Aparavi, you can use one cloud today and switch to another at any point. And what will happen then is all the change to the new data is going to go to that new cloud while your old data sits in the previous one. So you’re still having the two clouds, but we’ve immediately stopped the bleeding, stopped the growth of cloud A; you’re now putting all the new things in cloud B. We can recover seamlessly using the combination of those two locations.
But our pruning–and this is where some of the secret sauce comes in–is we’re actually able to prune data on policy or at will at sub-file granularity. So you can discover individual pieces of data and remove it automatically, or, as individual retention policy is going to hit, cloud A’s data is gonna shrink and shrink and shrink and shrink and shrink until you’ve got nothing but the bare minimum of data, so your bill is gonna go nowhere but down after you migrate to that new cloud. So we really wanted to end that cloud vendor lock-in to prevent people from saying, “Look, I can’t be multi-cloud. I can’t move my data over if somewhere cheaper comes along.” I think a lot of people are looking at Wasabi now and they’re like, “Ah, like why weren’t you here five years ago when I started this whole thing?”
With Aparavi, that argument is null at that point, because you can move to wherever the most cost-effective location for your data is. Our data-centric retention policies actually allow you to put data into different clouds based on importance. Not all data is created equal. I saw a study that I think Backblaze did a couple of days ago; it popped up on PR Newswire, and it was like, “46% of people are just backing up everything.” Users, they’re backing up everything, just like, “Yeah, get it there. I’m scared, I don’t know what I have. just get it there.” And, yeah, that might have worked a couple of years ago, but now, it doesn’t work. Different data has different retention policies, it needs to be treated differently, not all data is created equal. And so Aparavi knows that and allows you to to put different retention of policies, move it to different clouds, start it in different clouds, just ultimate flexibility is what we set out to do.
Mike: I mean, we’ve only seen the start of what’s going to have to happen with data retention with GDPR and the rest of it. I think it’s just going to keep getting more and more dense and complicated for a user to try to untangle, and what you’re offering are is some very powerful features: that increment, that sub-increment, ability to go in and do GDPR controls. It sounds amazing. Targeting a particular cloud by policy for certain kinds of data, again, same thing, just sounds like someone who’s facing a lot of those regulation and compliance issues is gonna absolutely have to have this kind of solution.
Jon: Yeah, absolutely. Last week we had a call with, I’m not going to name them, but this is probably one of the largest healthcare organizations in the world. I was talking with their director of data management, and it was almost like a therapy session, because the only way that they were actually able to handle their data was to start kind of hodgepodging things together, building their own solutions. They were really handcuffed not only by regulation, but by the sheer petabyte scale of data they had. And, again, Aparavi didn’t exist when they started this whole thing, so they had to set out to solve these problems that they had with their own tools. So, it’s a promising new prospect for us. It’s probably gonna take us a year to close. But, at any rate, yeah. Like I said, we became almost like the therapist, we were just like, “You don’t understand, I’ve got this and this.” And it’s like, “No, no, no, we do understand. We’ve been in this space for like 15 plus years. Like we get it, we came from this world. I understand.” You know, it was a lot of fun to have that call, a lot of solid validation for us.
Mike: Great. I mean, I can just see you walking in and sayin, for people who really have these huge problems, it’s like, “No, here’s a can of our pruning juice.” It’s like, “We’re gonna help you get out of this problem. We’re going to help you move forward here. Don’t worry, don’t worry.” Tell me a little bit about multi-cloud again. You said you can put data in a couple of different clouds. How many different clouds do you support? What does your cloud ecosystem look like?
Jon: Yeah. So, on this slide, I showed the different supported operating systems, the different clouds that we’re working with. Like I mentioned earlier, we have support for Google Cloud, AWS, Azure, we recently certified IBM Cloud. We’re supporting Wasabi, Caringo, Scality, Cloudian, from a private cloud object storage standpoint. We’re working with Oracle right now to do a certification with them as well. But that isn’t the end all be all, right? So you have these options within our software where a few of these are available for dropdown, the tier ones obviously, where we’ll actually go in and create the bucket for you if you don’t have one after you give us your keys and whatnot. The other capabilities, you actually can define your generic S3 object store. So, if you have something that’s not on this list, we’re still going to support you. We’re still going say, “Yeah, we can support that.” And our techs will work with you should there be any weird issues with regions or what have you, we’re able to support you on that front as well.
Mike: Everything from Mike’s object store that I’m writing myself, to some satellite object storage, we can talk about.
Jon: Elon Musk is building submarines and satellites in the sky. Exactly. As long as it’s an S3 storage API with V4 authentication, we support it.
Mike: Awesome. Let me ask you a little bit about how someone who’s dealt with the complexity issue… How do you simplify the complexity issue? I’ve got five different storage systems, three different clouds. I’ve got a bunch of different things I’m doing for lots of different applications. This is delivered management-wise in a nice way for a single person to really grab hold of the entire enterprise, right?
Jon: Again, we come from a lot of different backup engines and data protection storage interfaces. So we built from the ground up, on the get go, a multi-tier, a multi-tenant architecture. So, you have full capabilities to create users, sub-users, sub-sub-sub-sub-sub-users, etc, so, a lot of granularity on what a global admin is going to see, all the way down to individual accounts. And then you can kind of break down in each one. You can search through accounts if you don’t want to run through the tree of architectures there. But everything is presented as a service with a really straightforward web-based UI. I’ll give you a glimpse here on this next slide. This is our basic simulation UI. You see all sorts of fun Star Wars stuff in there–don’t tell Disney.
But, at any rate, this is the dashboard that you’re going to see. So you’re going to have a live feed, you’re going to have summary views of what’s going on. One of my favorite monitors here is actually a percent change monitor, which is going to tell you how much of my data has actually changed. You would be able to see that should you configure it on this dashboard. That’s a phenomenal kind of report to know, “Oh, have I gotten encrypted with like a cryptolocker-type virus?” because the software is going to say, “Hey, 80% of all your data change is going to throw a warning.” It’s going to let you know, it’s going to even give you some actions to be like, yeah, don’t do anything else. Just stop there. I don’t want to copy bad data to my archive. Exactly.
Mike: And I think, again, this could only grow in value as you get more multiple tenants, and you learn some more things about how archives grow at petabyte scale and across multiple clouds. And you can start to bring and leverage that knowledge across your entire user base, back to everybody because it is a SaaS kind of service.
Jon: Exactly. Really straightforward. Inside the platform, and if this was a live view, I’d just hop into the platform and show you. But we have a simulation mode and a live mode. And why we do this is, at enterprise scale, you have to bring new people on board, and you can do that by inviting them to the platform. They’ll have access to the simulation mode. They can go in there and they can do whatever they want and really learn the information without having any impact on your live production environment. It’s a pretty slick way to learn. There’s tips all throughout the platform there, little boxes that pop up if you want them there. You can obviously subdue those because they can get incredibly annoying if you’re an IT and sysadmin who’s been using this for a while. But really helpful when you’re getting set up and training new users.
Mike: I mean, really, you’re almost making this fun to say, “I want to try doing data protection on my whole enterprise using this, because this looks like I could actually deliver that.” And if I’m faced with the other panoply of legacy tools that I’ve got, I’m like, “Oh no, it’s never going to work. I’m never going to get this thing to keep up.” This offers a lot of promise and hope. With that, I think I’m going to turn it back over to Dave here, and maybe we’ve got some questions from our audience for you, Jon.
Jon: Yeah, absolutely. And I’ll just go to the next slide here so you guys can see, this is not the view from my office but close by. We’re in Santa Monica, California. The link that I have here is the link to the free demo. We do offer a 30-day free trial. We’ll extend that, if necessary. Usually, people get a pretty good idea by 30 days if it’s going to work for them or not, but some of the larger enterprises have got more than a 30-day closing cycle.
Mike: And you can lend them a couple petabytes of data too probably.
Jon: Oh yeah, sure. You know, have fun. Exactly. One of the beauties of Aparavi is, since we don’t host any data, we don’t really care how much you’re testing on a POC, it’s not going to impact your costs.
Dave: All right, great. Well, fabulous, guys. And I’m glad you brought that up, Jon, because that actually takes care of a question that came up quite a bit here today, which was, what about demos or proofs of concept and how do those work? And unless there’s anything you want to add, I think it’s pretty self-explanatory. I think you got it, but that did come up quite a bit.
Jon: Yeah, as you register an account, it’s going to automatically put you into demo mode by default for 30 days. There’s no restrictions on that 30 days. And then you can set up as many endpoints as you want, as many servers as you want, really get a good feel for how it’s going to work in your environment. Like I said, in extreme scenarios, we can extend it, as necessary.
Dave: Okay, great. So, that was an important one. We do have a bunch of questions but we’re a little bit short of time. If we don’t get an answer to your question, someone from Aparavi, or Mike, or someone from Truth in IT will get back to you with an answer. So, let’s get to this one. This one came up a couple of times too, which is, “How do you guys handle virtual machines?”
Jon: Virtual machines is a good one. So, we handle them. The best way that we found to handle them is by putting that transparent agent on the guests themselves as opposed to the advisory level. Because that’s gonna really give you the granularity of control that you need. It also helps with the efficiency of getting data out to cloud. It’s similar to how Veeam Cloud Connect actually works. They require the same thing because they found that the most efficient way to make sure that you don’t absolutely nuke the host is to go at the guest level when you’re dealing with cloud.
Dave: Mike, anything to add there, or should we go to the next question?
Mike: No. I think virtualization is an important one. We’re going to also probably find a lot of questions about new styles of processing, including containers and the rest of it going forward too, as we get into going from hundreds of VMs, thousands of containers, or millions of containers, how do you protect the data from those? And I’m sure you guys are thinking about that. Any comments now about that?
Jon: You know, I said this earlier when we were talking, but I’m just technical enough to get myself into trouble. So, I won’t attempt to fully answer that one. I will tell you that we are testing and working with containers. We’ve actually deployed Aparavi via container several times. So, it is capable, I know we’ve done it, I have no idea how we do it. I won’t answer that. Call us and I’ll get someone smart on the phone.
Dave: All right. Here’s another question that came out a couple of times too. This is going to be the last question we have time for, unfortunately. The question is, “How do you actually reduce cloud storage?” It’s pretty straightforward.
Jon: Yeah. So we talked about that pruning algorithm. So, what data pruning does is, because we’re placing data, whether it’s in-cloud or even on-premises, that data starts timing out, right? So if it’s got a 7-year retention period, 10-year retention period, you decide what you want to do. Because of the way that we’re actually placing data in-cloud, it’s that sub-file granularity. And because we know exactly what you have, and when that sub-file increment change, that 4 Kbyte change, whatever it may be, we know when that is no longer needed. We can actually automatically remove that out of your cloud storage location. And then, if you want to provide an extra policy on that and you want to take that and bring it back down on-prem, for example, for destruction, what have you, you can do it that way. But pruning is a way to actually go into our Aparavi file, wherever it sits, and remove something the moment its retention policy expires.
Dave: Okay. Fabulous. Mike, anything to add or should we take it on out from here?
Mike: No. One of the questions I was curious about really is that cloud, which cloud support, and we talked a little bit about that. How you reduce storage in the cloud, that was my thing. And I think you pretty much answered it. Maybe just a little bit about how someone might buy this solution. Is this a SaaS subscription? How are you pricing this?
Jon: Yeah, absolutely. So, for the average user or service providers out there, we price this based on source data protected in aggregate. And what I mean by that is, when you create your policy and you say, “I want to back up this data,” that’s what you get billed on; it’s not what happens after de-dupe and compression and stored out in the cloud because that can get really, really invisible.
Mike: Invisible, that’s right.
Jon: Yeah, exactly. I like “invisible,” because you’re like, “I have no idea what my bill is going to be this month.” With Aparavi, on the interface, if you look, you can see how many terabytes is protected. The demo says 201 on that look. So you would get charged on that 201 terabytes. We do flexible billing, so you can go month to month. You can save some money by choosing an annual plan upfront based on the amount of data you have. You can even pay the month in advance for data. But it’s all calculated on how much actual data you are protecting. The software calculates that monthly for you.
Dave: All right. Well, fabulous. Thank you all for watching. Thank you for coming today. Jon Calmes with Aparavi, thank you for joining us today. Mike Matchett, thank you.
Mike: Thanks, Dave.
Dave: And we’ll look forward hopefully to having you guys back again. If you guys didn’t win today, please come back. We’re running webcasts all the time and all sorts of Amazon promotions all the time. So please come back to another Truth in IT webcast. For now, on behalf of Jon and Mike, thank you very much for coming today, and we’ll sign off and wish you a great day.
Jon: All right. Thank you, sir.
Dave: Okay, thanks again. Bye-bye.