As a precursor to our recent webinar, Empowering Digital Transformation with Data Management, Aparavi’s Jonathan Calmes spoke with GigaOm’s Enrico Signoretti about data enhancement, intelligent archives, how to create additional value for existing data, and how Aparavi is helping organizations tackle the problem of unstructured data growth. Listen to the podcast below or read our transcript for Jonathan’s insights into getting the most out of your data management software.
Transcript: Optimizing Your Data Management Software
Enrico Signoretti: I’m your host Enrico Signoretti and today we’ll talk about data management software, intelligent archives, how to build them and how they can be leveraged to create additional value for existing data. To help me with this task, I asked Jonathan Calmes, VP of Business Development at Aparavi, to join me today. His company is a new entrant in the data management market with an innovative approach and a cloud-based technology that allows end users to quickly regain control of their data and reuse it efficiently. Hi Jon, how are you today?
Jonathan Calmes: Hey Enrico, I’m doing excellent, thanks for creating the time and inviting me to do this with you.
Enrico: Great. So let’s start with a brief introduction about you and Aparavi.
Jon: Yeah absolutely. So myself and pretty much all the leadership here at Aparavi got their chops in IT in the legacy backup marketplace, and Aparavi was formed around five years ago in stealth, and we started developing a product because of some of the challenges that we saw with unstructured data and how organizations were looking to solve that problem, and we felt that it deserved a new approach and thus a new company was formed.
Enrico: So explaining what is data management is becoming very complex lately. If you look around, it looks like everybody does data management today, even if they don’t. I mean, sometimes it looks like a new buzzword. What does data management mean from your point of view?
Jon: I’m tempted to answer that in all former buzzwords: copy data management and hybrid IT and hyperconverged. So data management: we see more and more people at the shows that we’re at that we know are not doing any type of intelligence on top of their data, and really it’s, “Yeah we’re data management, by the way.” So to Aparavi, data management is granular intelligence on the data that is being aggregated and then actionable items on what you do with that data.
So when we talk about intelligent data management, it is really with the basic understanding that not all data is created equal and different data has different use cases, different reliance for historical reference and future use, and then much of it is under a compliance regulation, so Aparavi’s goal is to add an intelligence layer prior to actually sending any data anywhere else.
Enrico: At the beginning, I mentioned “intelligent archive,” and you’re talking a lot about intelligence for the data. What is it an intelligent archive, then?
Jon: Yeah, so intelligent archive, with Aparavi, is the identification of data that perhaps should be no longer on-prem or no longer on your primary or secondary storage. So the archive aspect of it is actually helping to reduce the amount of copies you have on-prem of secondary storage or perhaps data that hasn’t been accessed. And the intelligent side of that is indexing it in such a way that allows for classification tagging, allows for full content search, allows for data to be discoverable with a Boolean-based search system, no matter how many clouds or storage repositories it may be spread across.
Enrico: So you mentioned cloud. Maybe it could be interesting to have an overview of the architecture. How does Aparavi work?
Jon: Aparavi uses a software as a service-style model or an on-prem model. It depends on the customer. We have some who want the hosted model where the UI and all the policy management happens through a hosted platform, whereas we have others who want the on-prem installation, and we can we can go into both environments very easily.
But APARAVI leverages an on-premise software appliance, is what we call it. That software appliance really is the orchestrator of our storage operation. So it’s going to be the one that’s enforcing the policies, grabbing the data. That’s where the actual index of the data is held. We also give options if you wanted to fortify your backup and business continuity plan, you could even keep an additional copy there on-prem in the Aparavi open format, as opposed to a proprietary backup file, and then we allow you to send that data off for long-term retention and/or archive to multiple cloud providers.
So I mentioned earlier that Aparavi believes that not all data is created equal, so we’ve actually built a platform that allows you to send different data to different clouds based on the sensitivity level of that data, all policy enforced from a from a top down hierarchical approach to policy management. We support pretty much all the tier one cloud providers, but we actually also support any generic S3 object storage. But it also doesn’t have to be cloud, right? So if some organizations haven’t yet embraced cloud, especially for sensitive data, you can send your sensitive data that might have trade secrets or PII built into it. You can send it to a secured on prem location while your sales, accounting, HR data, etc. is going to a less expensive, less secure tier of storage.
Enrico: I think you mentioned a lot of things that Aparavi can do and also the architecture is pretty, I would say, extensive. So from on-premises installation, including the S3 that you can have on the premises down to the cloud—and you support many different cloud providers. But one of the issues usually with data management is the very first step: data discovery and collection. How does Aparavi face this kind of challenge?
Jon: The Aparavi software that’s going to be running on-prem or near the data—so if you’re aggregating data in the cloud, you can absolutely host the appliance in the cloud—but that software appliance is where we’re doing the intelligence. So prior to touching cloud storage, the data has already been analyzed all the way down to the content level. We’ve enforced the classification rules that you’ve custom-created in our platform so we can look for patterns; we can look for date ranges or phrases, words… All of these things can be created, and we’re doing that on-prem before the data even touches the cloud.
One of the beautiful aspects of doing that is we’ll actually retain that index on-prem, so all that search, all that information that we’ve indexed can actually be discovered, and you’re searching against your on-prem location before even touching cloud to verify, “Yes this is the document that I need.” We provide all search results in context of the documents so you can see exactly where your data lies and then make sure that you’re not overtaxing your network or your cloud storage costs.
Enrico: From your explanation, what I understand is that there is this ingestion process, and during the ingestion process, you have a lot of indexing and tagging operations on the files that are coming into the system. So is it policy based or is it a standard procedure and then on top of it there are other operations that come along? How does it work actually?
Jon: Yeah, it’s policy based. So you’re going to create a policy that’s globally enforced throughout the entire enterprise. So you create one policy and then it will automatically apply that downstream. So, again, a hierarchical approach to policy management. And then while we’re pulling the data onto our software appliance while that ingestion is happening, that’s where the encryption is happening and compression and all that fun stuff, all that storage optimization that you like, and that’s also where we’re doing that tagging and the enforcement of classification rules.
So as soon as it hits our software appliance, it’s already encrypted, it’s already compressed, and all of those classification rules have been applied before it sends it out to the cloud location. So it’s highly secure and highly intelligent when it gets to the cloud.
Enrico: And if I remember well, you can do a full index. I mean, all the content can be indexed and they can then do all the searches that I can think of, right?
Jon: Yeah. So regardless of what classification rules you’ve created, we’re still indexing the entire content of the data, all the way down to individual characters. So when you’re running a search it’s a Boolean based search system, so you can search for combinations of words and date ranges and file types, etc. to really refine those results, and no matter what classification rules you’ve set up we still can find that full content search. Think of it as almost as e-discovery.
Enrico: So at the end of the day, the first thing that comes to my mind is that Aparavi enables any organization to build a sort of enterprise Google kind of search engine. This is the very first use case, am I right?
Jon: Yeah. I mean, that’s exactly right. The capability to search through your data on a content level globally, no matter where that data sits, even if it’s spread across ten different locations and ten different clouds. Aparavi can provide one single pane of glass and a unified view, in essence. I’m not going to call it a Google search because then I’ll probably get a call from Google that says, “Hey!” but yeah, it is a very, very deep search on that data. Absolutely.
Enrico: What are the access methods for performing this? I mean I can guess there is a web interface, but can I use APIs or other methods to access this data?
Jon: Along with the web app, the entire product is accessible through a RESTful API, so everything is available through RESTful. The other thing is it’s also command line driveable. So you don’t even have to use a UI if you don’t want to. Everything can be driven from command line. And so a lot of flexibility there. Another thing that we allow is if you wanted to mount this archive or its long term retained data, you can mount that locally and browse through it just file and folder tree on Windows and Linux. And that’s going to allow you to drive through the entire format as a map drive and see what you need—even preview files before retrieving them from a cloud location.
Enrico: We’re talking about search, but actually there are many operations that make this archive an intelligent archive. I suppose there are backend operations to keep the archive efficient, updated, and these kind of things, right?
Jon: Yeah. We’re ensuring that archive is constantly cleaned. We’re making sure that we’re not duplicating copies of identical files. And we’re ensuring that data is held for the specific amount of time that you’ve determined, so you can you can set a retention window, and because of the way that we’re actually storing data in the cloud, those individual changes of a file even down to the bit or block level, we can start to remove this the moment that potential window expires and those incremental versions aren’t reliant on a daisy chain of a file. So you can start to remove data out of your archive with subfile granularity to really reduce the amount of storage you’re holding over time and ensure that you’re staying on compliance as far as you’re deleting data when you say you’re going to delete it.
Enrico: You already mentioned encryption. But what about access security? I mean how do I manage you when, what, how can I see the content?
Jon: Aparavi’s interface—the super admin more or less has a view and everything that is happening in the environment. And then he can define user roles and responsibilities. So view only or no access, restricted access, all of those are are capable by creating different tiers of responsibility within the platform. Each individual then gets their own unique login associated with that—those user roles and responsibilities—to ensure that there’s no bad actors across the enterprise accessing and doing anything inappropriate with data.
Enrico: That sounds really interesting, but one thing that worries me is the cost, the cost of cloud in the backend. I mean you can do a lot of operations. Moving data to the cloud is always very cheap, but then when you need to get it back it could be really painful. So how do you address these kind of issues?
Jon: There’s obviously the reality of where cloud stands today and where the tier one cloud players stand today with egress fees. You know at Aparavi, the fact that we index on-prem means that you can ensure that you’re only pulling out the bare minimum information. So even though we’re working from a compressed, encrypted file in the cloud, without having to mount anything anywhere, you can do a single file retrieve from a cloud location as well as prior to doing that, seeing from a search results standpoint the content and the context of the document. You can verify that you’re pulling out the bare minimum of information out of the cloud before incurring any type of egress fees.
And this kind of speaks to where Aparavi believes the future of cloud is going. We really do believe it’s going to be a race to the bottom. We’re already seeing new entrants come into play. I just got done with a webinar with Backblaze; we did one with Wasabi as well. Both of these guys are really challenging the economics and the models of the tier one players by removing things like egress fees and GETs and PUT fees, things of that nature. So not only does Aparavi make sure that you can retrieve the bare minimum, but we also make sure that if a cheaper, better more valuable cloud comes along that aligns better with your business, that you can immediately start to point new data to and even those incremental changes of your archived data to that new location, while still allowing for seamless retrievability of data, even though you’ve got a file with one version in one cloud, another version in other cloud, to ensure that at any time you can really take advantage of those changing cloud economics.
Enrico: You mentioned that now you support several backend object stores including the new entrants like Wasabi and Backblaze. They do not have egress fees, but what happens if I have data on one of the major clouds and I am adding one of these new players? Is there any mechanism that helps me to make the migration?
Jon: Yeah, there’s a few things we can do. So if you are not a Aparavi customer and you’ve got a bunch of data in Amazon, we have the ability to work with you to help understand the data that you have in your S3 environment, and then start to decide what data should maybe go to a cheaper cloud provider. I shouldn’t say cheaper, I’ll say more valuable cloud provider, less expensive. There’s obviously steps related there and depending on how much you want to pull out of Amazon, you’re going to likely have to pay some egress fees from them.
If you are an existing Aparavi customer and you’ve been pointing data to Amazon, we can actually do a trickle or a bulk migration between different cloud providers without having to re-index all of that data. So you don’t have to pull it all out of Amazon bring it back down on prem re-index it and then push it backup to Wasabi. We can actually trickle migrate over time to ensure that you’re not paying for egress fees because we’ll actually be deleting data out of Amazon when this retention policy expires while slowly transferring all of the data over to a new provider.
So both a trickle migration, which is what most of our high storage users are using, or if it’s not a ton of data and you’re okay incurring a bulk egress fee from Amazon to get your data to a cheaper tier or a better tier, we can assist there as well.
Enrico: So in practice you can have the second repository and only when you need to create a new version of the file, you save it on the new cloud and so you don’t have to pay egress fees for the migration if necessary. Is this correct?
Jon: Yeah that’s correct. That’d be kind of a quick workflow for the…
Enrico: Yeah I’m trying to simplify here…
Jon: Yeah you simplified, totally, and obviously anyone who’s listening, we would be happy to set up a demo and walk you through this process, build some workflows within your use case.
Enrico: By listening to all of this I guess that the most common use cases for Aparavi range from search engines across the organization for compliance, discovery and more, but actually, potentially, you have many other use cases, and a lot of applications can access this data that you are storing for that, right?
Jon: Yeah that’s actually a good segue. So we have what we call a security feature, our open data format. The open data format is designed in such a way that once the encryption keys are provided and the intimate information, the right user access, all that good stuff is provided and you’ve grabbed an Aparavi file from somewhere, that documented data format is there to ensure that in the future, whether you want to integrate with an analytics tool or something along those lines, that format can be read in its native Aparavi ODF.
The other thing that it allows for is even if 20 years from now, you need to retrieve a file from Aparavi Version 2 and Aparavi is now on version 70 for example, that open data format is going to allow you to read and retrieve that data and make use of it again. So exactly the use case is using our open data format alongside even competing tools to read data, right? We don’t believe we want to stand in the way of people getting their data back, no matter how long it’s been, and that’s why we published and placed that documented data format in public domain.
Enrico: So it’s really a form of openness. I mean, giving all this information to your customer allows them to build applications on top of your system without any additional charges.
Jon: Yeah, that’s exactly right. It’s your data not ours, even though you’re using our intelligence and our data movement, it’s your storage and it’s your data. We really wanted to embrace that no matter what.
Enrico: Do you have any customers that are already doing it: using the APIs to build custom applications on top of your platform?
Jon: Oh man. Not that I know of, that sounds awesome. The most common use case for the API is additional tools that they have. So we have a service provider, an ISP in Europe that’s actually using the command line interface along with the REST API inside of their software stack. And so we’ve seen that, but as of now, to the best of my knowledge—and maybe someone is—I haven’t seen anyone truly building an entirely new platform on top of our stack.
Enrico: But we can see that this is still the beginning.
Jon: We are indeed a startup, and we want to make sure that we’re fulfilling the use cases that the industry has. Ideally, alongside our open data formats and REST APIs, we see some community building some really neat stuff on top of that.
Enrico: You mentioned that the product is available both for on-premises installation as well as on the cloud. So how is it licensed then?
Jon: In both cases we’re licensing on the amount of data that an organization is actually managing with Aparavi, so if you’ve indexed your data and ingested and moved it with Aparavi, we’re gonna charge based on that. We’re not going to charge based on machines or install points, appliances, things like that. We wanted to keep it really, really straightforward and we can build on a SaaS model and license on a SaaS based model, or we can license in a traditional CapEx environment where the software is actually installed on-prem. But both are still based on usage as opposed to installation points.
Enrico: Another nice thing that I saw on your website is that you have a public roadmap for the new features that are coming.
Jon: Yeah we keep a near-term roadmap on there. So this is things that are already in development or already have been in development and are going through either our beta customers for validation as well as our QA departments. So yeah that’s something that we felt was going to be imperative, letting people know where we’re going next. And one can make the argument that we’re exposing too much information to our competitors, but we believe that once it’s published up there, we’ve made enough progress on that that we’re gonna be first to market.
Enrico: What can we expect from Aparavi in the next three to six months then?
Jon: Beyond what you can see on our roadmap on our website, without getting into too much detail, more granularity as far as policies are concerned, is going to come down the pike. You’re going to have more actions for what you do with your data. So right now we can retrieve that data, but we’re going to start to see more actions on how you’re treating your data, what you want to do with it. Think more compliance-based features around specifically GDPR and California Data Privacy Act, etc.
Enrico: Very nice. And you mentioned that you can organize a demo? How do you usually manage demos or if you have any other way to try the product?
Jon: Aparavi has a couple of different ways. We actually have a simulation mode that’s available on our website that allows you to navigate through the interface with fictitious data. So it’s pre-populated information that’s fictitious. It doesn’t have all the features, but it really gives you a good feel for what it’s like to drive Aparavi in a production environment. Once an organization has identified interest, we will work with the organization to define the use case, to create workflows, and then our customer success team will assist in an installation in a rollout of the platform. And during that first 30 days, it installs in about 10 minutes. It’s not a long installation process, but we’ll work closely with the user during that more or less POC to define a scope of work and to ensure that we’re implementing in an accurate way. So more or less a hands on approach to POCs, but they can also drive in the simulation mode as well.
Enrico: Jon, this was a very, very nice conversation. I learned a lot about Aparavi’s data management software, and I hope our listeners did as well. At the end of the month, on the 25th of June, we will have a webinar together. So then we’ll talk more about data management and intelligent archives and how they can help to drive the digital transformation initiatives in enterprises.
So I suggest our listeners go on the GigaOm website. In the webinar section they will find how to subscribe for this free webinar. And that’s all. Maybe Jon, if you want to share a few links about Aparavi, Twitter handles so that if people want to stay in touch with you they can.
Jon: Absolutely. We definitely want to see you guys on that webinar coming up on the 25th. But in the meantime, you can visit us at Aparavi.com. As Enrico mentioned, we publish our roadmap. We also publish pricing so you can get an idea of what all this intelligence is going to cost you. And you can visit us on Twitter @AparaviSoftware and as well as on Instagram if you want a bit of a look into the lighter side of working for Aparavi. Instagram is a great place to go for that, again that’s @AparaviSoftware as well. So we would love to see you there. You can always reach out to me directly: firstname.lastname@example.org.
Enrico: That’s great. And if you come up with questions during these days, remember that our webinars are very interactive, so you will have the opportunity to ask these questions during the webinar. I think that’s all for today. Thank you again Jonathan for the time you spent with me, and bye bye.
Jon: All right. Thanks guys. Bye bye.