Aparavi’s CMO, Victoria Grey, VP of Business Development, Jonathan Calmes, and GigaOm’s Senior Data Storage Analyst, Enrico Signoretti, recently co-presented a webinar about how the right digital transformation data analytics management practices and intelligent archives can boost the value of data and make it reusable for new applications. Topics included:

  • Understanding data growth and diversity
  • Transforming a liability into an asset with classification and search
  • How to build efficient intelligent archives for data reusability
  • Applications that can benefit from data reusability

Watch the recorded webinar below!

Transcript: Empowering Digital Transformation with Data Management

Enrico: Welcome everybody to this GigaOm webinar, “Empowering Digital Transformation with Data Management.” This will be a very interesting webinar because we will talk a lot about the advantages of using data management and the benefits that this brings to your digital transformation initiatives.

I’m Enrico Signoretti, Analyst for GigaOm. I cover data storage mostly, but also all the interactions between data storage, traditional data storage, and cloud. With me today we have Jonathan Calmes, VP of Business Development at Aparavi.

Hi, Jonathan. How are you today?

Jonathan: Hey. Well, I’d be better if I had a voice. But doing good.

Enrico: And also, Victoria Grey, Chief Marketing Officer that probably will do most of the talking. Right, Victoria?

Victoria: Good morning and good evening. Yes, I will. So I’m the puppet that Jon is manipulating behind the scenes to do the speaking for him, since he is a bit under the weather. Between the two of us, we’ll get this covered. I’m happy to be here.

Enrico: Yeah. Thank you again for the time that you are dedicating to us today. Aparavi is a startup with a very interesting data management solution, and we will also spend a few minutes talking about it at the end of the session.

So without any further ado, let’s start with the webinar. The agenda is divided in a few points. So we will talk about unstructured data as a liability at the beginning. So everybody has issues with data and data storage, in particular at the moment, because they are storing, they’re piling up, huge amounts of data. So we will talk also how to transform data into an asset, the process to make it happen, data management and digital transformation, as we said. We will have discussions around all these topics and how to face these challenges and a brief introduction about Aparavi, again.

Of course, we will also have a Q&A at the end. So prepare your questions. You have a tab on your GoToWebinar control panel on the right of your screen probably, just put your question in that tab and we will pick them out at the end of the session. Also, we will have three polls during the session. I strongly invite you to respond to the polls so that will be very helpful for us and for you to understand what is happening in the market about data management and how we can leverage it for your initiative in your organization.

So let’s start immediately with the first of our polls. What is the year-on-year data growth in your organization? And you can select one of the answers. So less than 10% year-on-year, between 10% and 40%, between 40% and 80%, more than 80%, or you don’t know. Let’s start collecting the answers. There is a pop-up window that probably came out on your screen. So Vicki, what do you see in the market from your customers while we are waiting our guests with their answers?

Victoria: This is a great question, Enrico. And, you know, it does vary across organizations. But one of the interesting things that we find is that a number of organizations are not even certain, like your last point there, “Good question. I wish I knew.” But in particular, we find that a lot of organizations, when it comes to their unstructured data, which is, of course, what we’re really talking about here on this webinar, they really don’t know. But if some of the organizations such as yourself, the analysts that have looked into this in the industry, suggest that overall data management across the board is… Well, gee, I hesitate to actually give any numbers while people are answering it because I don’t want to sway it.

But let me put it this way. Unstructured data is growing at a significantly faster, like close to two times, as much as data overall. So that’s what’s presenting so much of a challenge. And a lot of organizations, I think, need to actually take a look at this and understand it.

Enrico: Yes, I totally agree with you. I would say traditional enterprise organizations are all between 10% and 40%. In some verticals, we see a huge year-on-year growth. But it’s mostly because it changes the way they treat data and it changes the amount of data they collect or they have new processes that are creating much more data lately. And as you said, it usually happens also that many organizations don’t really know what is happening in their data storage. They know the amount of capacity they have. But they actually don’t know how they are utilized.

So let’s take a look at the answers then. They will be out in a second. And in fact, as we said, we can see that almost 50% are in this range between 10% and 80%, with 30% in the range of 10% to 40%, and 20% in the range of 40% to 80%. But also, again, 33% of our listeners don’t really know how much data they have stored. This is incredible. But it’s still interesting. Let’s move on with the presentation. So we will try to help them to understand better how and what to store and where.

Reality of Data Growth

Enrico: So let’s talk a little bit of the reality of data growth, why it’s so complicated dealing with data today. There are several aspects around it. Some are really about the fact that there are a lot of new things that are happening in the market, meaning, for example… These are, of course, examples. It’s not that everybody has a jet and collects data about it. But actually, in your industry something is happening. Call it IoT, call it edge computing, call it whatever you want. You are collecting much more data than you thought. In most of the cases, we are talking about machine-generated data. There, you could ask for research, support, big data analytics, a lot of applications that are happy to deliver new products, developing them, and maybe also you’re using it to analyze trends, find partners for support, for example, etc. So they are really, really industry-related.

And if you look at the kind of retention, it really depends. If you’re taking a general map today, you don’t want to delete it. But actually if you’re recording from your car a video, maybe it’s necessary for better drive. Maybe it’s necessary for a few weeks, but you don’t record everything forever.

And other things that are happening are happening, again, in our data centers. So we have our log files. A small data center can create 50 gigabytes per week of logs that you have to scan, to understand, and so on. And surveillance cameras. They are incredible. They are a huge amount of them. Last weekend, I was in Las Vegas. There were so many of them. And they take pictures of you everywhere. And only 1 camera is 48 gigabytes per day. I mean, high definition, 30 frames per second, whatever. But it’s a lot of data that you’re storing. And even a lot of backup would be a huge amount of data.

So if you start thinking about this and you think again, “These are machine-generated,” but actually, the part process is totally different: protection, security, compliance. So things changed dramatically here because for surveillance camera it could be important for many reasons. It could be important for security. It could be important for any other aspects of your work, as well as logs. They are all about security. You are scanning everything. You want to collect information for understanding what is happening in your data centers. So we are a little bit in the middle. In business, infrastructure-related kind of information, their retention is usually very short. You don’t keep your data center logs for 10 years. It doesn’t make any sense. But actually, they are very usable in the short-term.

And then there is all your business documents, all the things that you create and use for work, you share with your colleagues. We are talking about PDFs, PowerPoint slides, Word documents, whatever. They are smaller in size, but there are millions of them, especially in very large organizations. They are human-generated. They are really related to your business. These are business-critical documents. This is business-critical data. And, of course, they are for business needs. And some of them have infinite retention.

And here, I’m not only talking about the documents you are creating for a small project. The data has a very short lifespan. Sometimes, I talk also about healthcare. So your doctor is prescribing you something. And for legal reasons, they have to keep that prescription for like, I don’t know, 100 years. So it’s something that is a very long time. So Vicki, do you have something to add on this? What’s your opinion on all of this, I would say, hell of data creation?

Victoria: Well, I think you have identified not only some of the traditional points of data growth like all those business files and backup files and log files, but all the new data sources that are coming in from machines and IoT that is causing this explosion. And we find that not only are we talking about a huge increase in the overall capacity of the data but the number of files, which is its own set of complexities.

So organizations are grappling not only with millions and millions of files but some of them with billions of files. So we have organizations coming to us talking about, you know, several petabytes with billions of files that they’ve got to figure out what do with, which, of course, is what we’re talking about here.

And then the other piece I would add is that not all data is created equal. So understanding what you’ve got in order to intelligently manage it is a critical part of that data management and digital transformation.

Unstructured Data is a Liability

Enrico: Let’s talk about why you think that if you don’t manage it correctly, unstructured data is a liability. Well, there are a few things, starting with organization policies. Your organization has policies about what you have to do with the data that is created in the organization printer. That sometimes is very large. I mean, including distributing through the mobile locations. So many organizations now have a never-delete kind of policy. I mean, store anything because we don’t know if it’s necessary or not. So just to be sure, store everything. I don’t know if you see this in the field, Vicki.

Victoria: Oh, yes. And, you know, we talk about this a lot with our clients and with prospects about what their policies are for keeping data. And, you know, the best practices are that data is understood and they have policies around retention. But a lot of organizations struggle with that because they don’t have insight into their data. And certainly, the IT department doesn’t want to be responsible for deleting something that could potentially be critical. So this is an area that we find that organizations really struggle with. We’ll get into this, but we strive to make a tool that enables them to more easily manage whatever their policy is, whether it is to get rid of data over time or, in fact, keep all of their data forever.

Enrico: Yes. And I clicked already on the next poiny, demanding regulations. I always think about GDPR. But actually, regulations are of several types: finance regulation, there are insurances, there are healthcare regulations. The world is crazy about privacy now. So you need to be compliant with a lot of things. And these rules change all the time. They are not permanent. They do not remain the same. So this becomes another problem. So without understanding what you have in your systems, it becomes almost impossible to be compliant. And I’m sure that you have something to add on this also.

Victoria: Well, one quick point. It’s not just about Europe anymore, right? California has instituted the California Data Privacy Act, which goes into effect in January. And other states and the federal government have taken notice. And there’s a number of initiatives around the U.S. to adopt very similar regulations. So it’s critical for organizations to be able to know how they’re going to comply, because it looks pretty certain that this is the trend.

Enrico: Well, generally, it’s around the corner, actually. In Europe, we started talking about this like two years before the actual promulgation of the law. So it’s a special time. And again, storage infrastructure. So the problem of building today a sustainable storage infrastructure. I mean, if you’re starting with the first few points that we mentioned and with the data growth of the previous slide, you know that it’s really difficult today; think about dollar per gigabyte, think about how you will sustain the scalability of this infrastructure. So scale out, yes. Cloud, I don’t know.

Cloud is another thing. I put another point about cloud just because yes, theoretically, the dollar per gigabyte of cloud storage could be very, very low. But actually, until you need the information back, so you start paying egress fees in some cases. You pay for the transactions and so on, so that if you store something, especially if you cache something because you think you will never access it again and then for any reason you need it back, you start paying money. And if you need it frequently, you pay a lot. And so these are I think two important aspects about the infrastructure in general and the custom infrastructure’s sustainability. What do you think, Vicki?

Victoria: Well, I think this is something that organizations will identify right away as something that they grapple with all the time. And it’s so hard to predict what tomorrow’s storage cost is gonna be. It tends to go down. But new architectures come in that change the economics. And I think what’s important is for organizations to be able to maintain nimbleness so that they can move their data where it’s most economically beneficial to be, right? And so that’s a different way to think about it, rather than, “Let me just get another box or just fire up more storage up in AWS and try and forget about it.”

Enrico: Yes, and consider that we always think about that first question we had at the beginning, and if your organization is storing near 40% to 80% more data than last year, your storage infrastructure cost is not shrinking that much. So that’s a problem. And, of course, there are traditional processes that don’t work anymore. Traditional backup in large repositories is difficult. Again, traditional archiving without knowing, without your storing in these archives, dumb archives, as opposed to intelligent archives that we mention often and often here. So these are no longer the right way to think about these things.

And traditional archive, just because we mentioned the dumb archive, is no longer enough. There is too many constraints and limitations on these techniques that are not helping. There are a few things. So if we look at all of this together, you can also still go on tape. But the moment you need that information, then you will need it immediately. When I talk about this thing, I think about services like, for example, Amazon Glacier. Every public cloud vendor today has a similar service. Glacier, you store and you pay as little as possible, the information you store there. But actually, when you need it, you get it after hours. And for some of the workloads that we have or some of the information that we need, it is too much. And maybe there is something I’m missing here. I don’t know. I’m asking you if you see something different in the field, if your customer have different things about data today?

Victoria: Well, I think that with backup and archive, the legacy systems were designed with the idea that you’d run the processes and hopefully never have to touch it again. It was an insurance policy, not a method to help you actively manage your data. And that’s what’s no longer working, right? That’s the part that needs to change.

Enrico: Yes, indeed. So let’s start with a new poll. And what is the major pain point regarding data storage in your organization? So we started with this slide. And we talked about what we see. What’s your position about that? So demanding organization and business policies, regulation and compliance, infrastructure cost, lack of visibility in the files and content, or maybe it’s something else? So let’s start with the poll. And while we are waiting to get the results, I can ask you guys if you have any additional insight regarding this.

Victoria: Well, and actually, Jon and I were just talking about some clients yesterday that are struggling with some of those. So Jon, not to tap your voice too heavily, can you share some of the stories that we were talking about yesterday?

Jon: Yeah, sure. Hopefully, you all can hear me okay. But yeah, the clients that we’re seeing, the ones we were talking about yesterday, really it’s this combination of regulation and compliance, along with the organization and business policies. So they’re looking at their data and really it is this kind of, especially unstructured data, this opaque mass that without great visibility into. So they’ve tapped into Aparavi to help not only look at what’s the data I have, where does it fit, and what type of compliance, whether that’s external compliance and regulations or internal business policies—where does that fit and where should it go? But also it’s the infrastructure costs. So really the top three items there seem to be where most of the organizations that are organically finding that are really falling into.

Victoria: And let me just add real quickly, Enrico, because it’s interesting: I had somebody throw a term at me the other day because we talk about this opaque hole of data that you don’t know what is in there and clients tell us that all the time. But somebody said to me, “Oh, you know what the new term is for that? It’s dark data, because this organization has tons and tons of it, but they just have no clue what’s actually there.”

Enrico: Yes indeed. And so we got answers from almost everybody. So maybe it’s time to share the result. And just a couple of seconds. Here we go. So it’s very, very varied. The most important points are infrastructure cost, regulation and compliance. These two take 60% in total. But actually, lack of visibility, as you said, comes immediately after. And demanding organization is just 14%, but actually, it’s still something. So the most important thing is infrastructure costs, followed by regulations and lack of visibility.

Data Management: Transforming Data from a Liability to an Asset

Enrico: Let’s go on with the slides and see what we can do to help our listeners solve this kind of problem. So how can you transform data from a liability to an asset? How can you use it? There is a process that you can put in place, using tools in the market. The first thing that you have to do is data discovery and collection. I mean, depending on the tools that you will adopt, you have to understand what you have. We talked since the beginning: it’s very, very important to go in your servers, creating your own storage, or use agents or backup systems or what else, to collect everything that you can. Catalog and index everything and do analysis and actions on top of it.

This will add, for example, something that is very, very simple but tremendously usable. Search. The moment you catalog and index your data, you can start searching it. And, of course, if you can do an analysis, the first actions that you can do is information lifecycle management. You can really understand what is old, what is not, if you are still storing data of people that left the company, for example, or any other things like that. And, of course, by applying the rules—actions also means applying rules that you define, meaning you can be compliant with regulations and so on. And then, again, you can start the process again and again and again.

But, of course, when I talk about actions, there are many more things that could happen. For example, you can, with a specific search of patterns, of trends, you can understand a lot of things about security. Just a small thing: if you start to notice that you have storage procedures and files that are changing very quickly in a matter of days, maybe there is something wrong. Or if somebody is downloading tons of files and usually they don’t do it, then there is somebody that’s stealing your data. So you can do something about it. Or even more, once you understand what’s in your data, you start to augment it. I mean, you can add tags on top of it. And adding tags on your files, for example, opens another realm of possibilities. You can do advanced security management. But also, you can prepare data for your big data analytics because you have tags, you can create views of specific files with specific characteristics or even more.

The next step would be being ready. I’m not telling you you need to do it today. But actually, machine learning and AI is easier and easier to adopt. You can understand much more of your data and use new techniques to take advantage of it. And Victoria, I was mentioning to you yesterday. Just to give as an example to our audience that I was with an IT manager of a very large law firm. And they have offices all around the world. We are taking about 8,000 lawyers. And at the end of the day, it comes out that they have a lot of branches, of course. They do contracts. They do a lot of stuff. And many times, they replicate similar documents.

For them, the traditional mechanisms that they used in the past just for each document, prepare the document, have the description of it because you want to search but you can’t fully index the entire document. And putting tags and so on is a very, very cumbersome kind of process. So it’s too complicated also to maintain the data. And at the end of the day, it’s very, very tough for them to search across all their knowledge base, across all of their contract base. And sometimes it would be easy to find all the documents in the U.S., you know, New York State for buildings, that are more than $20 million because the same legislation, the same kind of contract can be applied for other things. And they can do that. And this is just a search thing. But think about applying machine learning. It means that you potentially can do things like train a neural network and understand if your documents are perfectly formatted. So without risking losing it, without risking errors, mistakes and so on. What do you think about all these feature, Vicki?

Victoria: Well, I think that, you know, we were talking in the previous poll about how people are more concerned about compliance than the visibility into their files. But, in fact, it’s the visibility into their files that can provide the compliance and then set an organization up for, as you talk about, potentially reusing the data. But these things are not mutually exclusive, right? It’s one step leads into the next, as you were talking about. So once you can have visibility and understand what’s in there. now you have the ability to do some real fundamental assets for your company in terms of adhering to compliance regulations and business policies and then potentially longer-term, find a way to reuse that data for business transformation and augmentation.

The Value of Data Management: Understand and Reuse Your Data

Enrico: Right. Right. And, of course, there is even more to that. In fact, if we started to look at all the points, starting with organization-wide search and data reuse is probably the first step; it’s easier to implement for the end user with the right tools. I mean, it’s not easy per se, but actually, once you have these kind of tools in place, giving your users the ability to search everything in the organization is powerful. We are talking about a Google-like experience through your data. So I think that, Victoria, this is one of the first features that your customers love, right?

Victoria: Yeah. We launched this ability earlier this year. And we’ll, of course, talk more about it when we get to talking about the Aparavi architecture. But it’s a foundational capability for really managing your data. You’ve got to be able to understand what’s there.

Enrico: And, of course, as I mentioned on the previous slides, security and risk management. You understand your data. You understand what’s in the files. So you can understand if you are doing it correctly. Then risk management, just think about if you can analyze your file and understand it. If you are storing credit card numbers in clear, not encrypted. This is powerful, especially now. You don’t want anybody to steal this kind of information. So you want an alarm set somewhere that says, “Oh, we are an Excel file with 20,000 of our customers’ names and credit card information.” So you want to know this information. And I’m not even talking about ransomware. I’m not even talking about all the rest.

E-discovery. Everybody for any legal reason, you can…sometimes you need to go across all your written storage repositories, find information that are necessary and you need to get them, you need to hold them for a long time. You need a tool that can manage all of these things. And, of course, compliance and data governance. But I didn’t put compliance. I put adaptable compliance because as we talked at the beginning, generally, in the U.S., you will have something that is really similar to GDPR. So you have to run and set up policies to be compliant. But actually, this is version one. In two, three, four years, for example, here in Europe, they are already talking about GDPR two. What will happen? So you need a mechanism that is very, very flexible and very, very efficient to modify the rules where it’s necessary.

And, of course, all of this, knowing the files, knowing if you really need them, their value and so on, means that you can build tiers. You can build something that is more sustainable than what you are doing today. And I think there is much more on that. But actually, I just touched on those four or five points. I don’t know, Vicki, if you have more thoughts on this.

Victoria: Well, I think we’re gonna be adding a little bit more in the coming discussion, tapping back to your previous slide about setting up your data ultimately for reuse in a way that is meaningful to the company. So the first step is to just get a handle on it so you can meet business policies and regulations, both from inside and outside the organization. But long term, wouldn’t it be beneficial to be able to actually tap important data to learn more about your business and to help to transform it?

How Data Management Can Support Digital Transformation

Enrico: Yes. And we can also take a different perspective on data management. And there are several ways to think about it. Data management is very broad now. Sometimes it’s also difficult because many companies, many vendors, are shifting their messaging and there are more data management companies. In the past, they were a data storage company or they were a data protection company. But actually, all of these things are now included in data management, in a broader concept.

So there are some solutions that are really, really infrastructure-focused. And they are very beneficial. Maybe the infrastructure issue is the goal, infrastructure total cost of ownership. So these solutions usually work more on the container. I mean, not on what’s inside the file, but in the file itself, if we are talking of unstructured data, of course. So you can think about automatic tiering, information lifecycle, or data copy management, and analytics. The moment we shift to features that are more high level, we need to know what’s inside the content. So to understand security risk, e-discovery, you need to know what you’re looking for. You need to know what’s inside any single file. And these are usually much more business-focused kind of obligations, where you really need the information and not just moving files around, for example. I don’t know, Victoria, but as far as I can tell, Aparavi is more on the right hand of this spectrum, right?

Victoria: We are. And we touch some of the areas on the left as far as they pertain to helping you to better manage your storage infrastructure. But ultimately, yes, we’re focused on the data content and how to maximize it.

How Data Management Can Enable Digital Transformation

Enrico: Very good. So let’s talk about digital transformation finally. This was the very topic. How data management can enable digital transformation. Well, we already touched a lot of these points. And when we give this kind of benefit, we are already starting to tackle the digital transformation initiatives to empower that. So organization-wide search and data reuse, security and risk management, e-discovery, we already covered all of it. Adaptable compliance and helping to build a sustainable storage infrastructure. But here, we count something that is really interesting: giving APIs and access interfaces to everything in your organization. So the right information is always available. No matter if you have a legacy application, no matter if you have been looking at something new, you have all of the information available.

And then there is this concept of making this data that was created. Maybe it was created in a file server, but you want to access it from a phone, you want to access it remotely, and so on. So you are giving access to new application that can reuse this information. You can, as I said, create data sets for machine learning and AI projects, when it’s a totally new thing. We talked a lot, we read a lot about machine learning. There are still not many companies adopting it in the real world. We are still at the very beginning. But everybody’s interested because the potential is huge. So in the next few years, in the next couple of years, I would say, more and more, even as more medium enterprises will adopt more and more of these kind of techniques.

And you can enable new processes. So just the fact that you are understanding what’s in your data, if you’re storing the wrong file, if you’re storing the wrong information, you can set up alarms to understand what’s wrong and start a new process. Maybe related to GDPR, maybe related to whatever is needed. And, of course, everything is smoother. The quality of the service that you can give on your data, of course, from the applications is better. I don’t know. I mentioned so many things in the slide and maybe, Vicki, you still have something to add. I remember that some of your customers are doing great things around APIs, around new application with data management.

Victoria: Yeah. And, in fact, that was the company we were talking just recently about, as well. And Jon, you were telling me the story. I think it was Multi-Comm. Have I got that correct?

Jon: Yeah, exactly. That’s the service provider who’s implementing it for the end-user, but they actually use our platform command line level to completely automate the entire processes of Aparavi within their thousands of users. So they use a combination of our API calls to integrate with their existing tools, as well as the fact that our entire platform can be driven in the command line level to really offer a transparent service to their end-users.

Enrico: Which is great. I mean, if they are using the command line interface and API, it probably is not even difficult for them to build—not for them but for all organization—build applications and then taking advantage of these kind of tools. So let’s go with another poll. What are the most interesting aspects of unstructured data management for you? Indexing and search? Compliance and legal applications? Infrastructure TCO reduction? Reuse of data for new purposes? Other? Okay. If you look at this also, you can select all the answers that apply.

Maybe after this presentation, your answer will be a little bit different than at the beginning because at the beginning, we asked you, “What are your problems, pains with data management, with data storage?” So maybe knowing the potential that you can have with data management tools, like those that we are talking about in this section, you will have a different opinion now. So let’s start collecting. A few answers are already trickling in. I don’t know. Vicki, you already mentioned that one of your customers is using data for new purposes, for new applications. But I know that index and searching is very popular, for example. Any other of these that is interesting for your customers usually?

Victoria: Yes. And, in fact, it parallels what some of what people were responding in the polls here. We’ve had a number of organizations adopt us to help with compliance. And what we find interesting in working with them is that they’ll bring Aparavi in to help with whatever compliance regulations they are working on and having commitments around. But then, when we introduced the indexing and search and classification, they’re coming back to us to say, “Well, that’s a whole new level of opportunity that we have with the data that we hadn’t thought of.” So, you know, we’re finding that our clients are taking this in a multiple-step process, that they might initially deploy the platform for purposes of compliance or business regulations, but then find that because of its capabilities, it brings them into whole new considerations about what they might be able to do in the future.

Enrico: Oh, yes. I totally agree with you. So most of the end-users—and I forgot to mention it earlier. Thank you, by the way. So most of the end-user adopt these data management tools for one use case. And then they understand what they can do when they start adopting it for more and more and more. And it’s really interesting to see the evolution. When I meet them the first time and we talk about these kind of issues, especially before the initial adoption, and then I met over time, two, three times later in a couple of years, it’s incredible. So their perspective changes drastically.

So let’s see the results from the audience. But as we said, probably we’ll see results all across the board. In fact, we have index and search that is very very important for 50% of our…Of course, in this case, we will have more than 100% because it was multiple answer. But actually, the reuse of data for new purposes is almost 70%. That’s almost unexpected for me. I mean, usually, I find index and search and compliance on top. But I’m happy to see this change, with more enterprises, more organizations, looking for reusing their data. So it’s fantastic. And so let’s go on with a short introduction about Aparavi. So I will hand over the microphone to you, to you, Victoria. And let’s go.

SaaS Delivered Data Analysis, Archive, and Access

Victoria: Great. Okay. Let’s go. Let’s just jump into the next slide so we can talk a little bit about Aparavi’s architecture. First, let me comment that this is a software-only solution. That’s one of the questions that we get frequently because you see that software appliance there in the middle, but we deliver it as a software. There is no hardware from Aparavi. While we do the data management function for all your unstructured data and enable you to move data to the cloud, one or more clouds, or on-premises to any path or S3-compatible storage, we do not actually sell storage.

So what Aparavi does is it will go through all of your unstructured data and organize it, create an index, enable you to classify it with any sort of classification. Think legal, think confidential, personal information, etc. And then we will archive that data to the destination of your choice, be it any one of the clouds represented there, as I said, on-premises. Certainly, there’s still organizations that are cloud-wary, although I will say for most organizations, they tend to have initiatives to move data to the cloud. And we can very much help with that. And Aparavi was built fundamentally to be taking advantage of the cloud. So it’s not a bolt-on for us.

Overall, data is managed through a single pane of glass with the management console that enables you to set policies for your data for movement and to query the index. And finally, we make data accessible either through virtual CIFS or NFS mounts. We have a rest API that allows you to tap the data for reuse via analytics or other kinds of applications. And also our data is stored in a format that is open and published. So no matter what happens anytime in the future, even if Aparavi weren’t around, because of that published data format, organizations will always be able to get back to their data. And that’s a fundamental premise from Aparavi is that the data is yours. It’s not ours. It’s yours. And so we’ll always enable you to get to it in any way.

So within this whole arena here, we have that capability of classification that I had mentioned. And it’s super easy to add new classifications at a later point. And then we have full-content search. So Aparavi enables you to easily and quickly search all of your data through the index. And it’s both metadata and full-content. So, for example, when you think about GDPR or the California Data Privacy Act and requirements around Right to Be Forgotten, you need a way to be able to identify what data is reflecting personal information.

So Aparavi allows for the search safe on pattern matching. If you were looking for all Social Security numbers, you can just use a search like threedigits.twodigits.fourdigits. And Aparavi will return to you a list of all the files that contain that content. And, very importantly, it will do that without bringing the data back down from the cloud. So just finding data is quick, easy, and it has no egress fees. It isn’t until you find what you need and you specifically identify to retrieve it, would it then be brought down back on premises. So it’s highly efficient and cost-effective on top of its ability to deliver that full-content search. Enrico, any comments or questions for me on this?

Enrico: Well, no. I know that you support many more cloud providers than you mentioned in the slide.

Victoria: We do, yeah, like IBM Cloud. We’ve mentioned Oracle Cloud. We’ve got IBM cloud. We’ve got a number of private cloud vendors like Scality, Cloudian, Caringo. Some of the other newer and upstart public cloud vendors that are trying to take market share away from AWS and Azure would be Wasabi, Backblaze. And, in fact, for anybody who’s out there, if there is a cloud vendor of your choice that you don’t see here that you would want Aparavi to support, we have a quick and easy certification process that we go through. And we’re constantly adding new cloud destinations.

Enrico: That’s very good. This is nice to know.

Jon: Yeah. And actually, I want to jump in there. It’s also important to note that we also support object storage out of the box. So even if you don’t see some on the list or someone who’s certified, we still support the creation of generic S3 object stores. So any of the cloud providers using VS3 API, Aparavi will support, even if it’s not on the certified list.

Build an Organized, Intelligent, Accessible Archive Today

Victoria: Good point, Jon. Okay, let’s go onto the next slide. So this is what we enable organizations to do. They can build an organized, intelligent, and accessible archive of their data so it’s not just this static, “I hope I never have to touch it again.” This will help organizations to create a battle-tested plan for adhering to not only existing, but any new requirements that come down the road that we’re not even aware of yet. As we have talked about, there’s GDPR, there’s the California Data Privacy Act. More is coming. What are you gonna do? This will help organizations to understand how they can quickly and easily meet those needs.

Not all data is created equal. There’s a lot of ROT out there. Redundant and outdated and trivial data that needs to be handled appropriately. But why fill up your expensive storage with ROT when you can handle it more appropriately and meanwhile find the data that is of real critical importance for an organization for potential reuse, for historical reference, for meeting those regulatory requirements so you can have intelligence about what’s in there.

And then finally, the really critical piece for the future that forward-thinking organizations are starting to consider how they can do is tapping data for analysis and future use. And so if we can help you to not just organize your data but make it accessible so that machine learning, AI, just even understanding history to know your future is enabled. Jon, do you want to add any comments on this?

Jon: No, I think you nailed that. Good job.

Victoria: Enrico?

Awards and Recognition Since Launch 2018

Enrico: Okay. Very good. So let’s skip to the last slide. Meaning that you are very well recognized in the industry.

Victoria: Yep. We launched in May of 2018. And we have enjoyed a lot of attention. It’s been a terrific year and just a short quote down there, one of our clients who came in really from looking at Aparavi from the client’s perspective and then found that the search and organization was really helpful. And he just says, “With Aparavi, you can more easily organize and manage vast amounts of data and more easily find specific data as needed. It’s a real game-changer.” This is what we find with our clients.


Enrico: So let’s go to the Q&A section. So we have plenty of questions but actually not a lot of time. So some of them will be answered online. I will prioritize the first that came in. So let’s start with, “You didn’t talk about security access or auditing for this data management solution.” Yes, my fault. “Any roadway access mechanism in place? Any way to see who is doing what?” Okay. So in general terms, and then I will hand over the microphone again to Victoria and Jon. In general terms, all the solutions are really, really focused on security. So yes, having roadway access is fundamental.

So you don’t want to share your data with the people that shouldn’t be able to see, edit, or copy it. And also, most of the time, they are integrated also with active directories, these kind of tools, so they reflect the organization roles. And logging for all these solutions is pretty straight. And you can have plenty of information on who is looking at the document, when, how many, and so on. So yes, my mistake not mentioning it, but usually it was not actually the focus of this webinar but usually all of them have this. And I don’t know how Aparavi implements this.

Victoria: We agree with you that it’s important to be able to integrate like with Active Directory and LDAP. And it’s one of the things that we’re actually working on right now. So it is a critical direction that we go in and recognize and I support that.

Jon: Aparavi does have role-based access built into the platform. You actually can create user roles and responsibilities with limited access. Read-only is a possibility in that, as well. So all of the data that’s being ingested by Aparavi is controlled through more or less a super admin that can distribute roles and responsibilities. And Aparavi also encrypts all data in transit and at rest. So security is paramount to us, as well.

Enrico: Very good. So this is a question I can’t answer. How is Aparavi licensed?

Victoria: Ah, okay. Very good. So we’ve got a couple different options. The primary one that most of our clients are using is Aparavi was designed to be delivered software as a service. And so that makes it very convenient for licensing the product on a month-to-month or year-to-year, depending on how you want to structure it. But also we offer it on-premises for those organizations that would prefer to have essentially enterprise licensing. And so we have multiple options for how organizations can do this.

Enrico: Which kind of clients are supported by this type of solution? Of course, in general terms, again, you can think about any type of file. Depending on the solution and how it treats data, there are several features that can be enabled or not. Like, if you are sorting movie files, it’s hard to do full-text search on that, of course. But, of course, the tagging and all the other features are always available and maybe there are other options in this tool that can help you to manage these files. Do you have an answer for this, Victoria?

Victoria: You’re exactly right, Enrico. So we can handle any kind of file, unstructured data. So anything not in a database. But how much intelligence you can have, and it varies depending on the type of file, and we’re continuing to add support and insight into more files or file types as we go along. Like, of course, a real fundamental one is all office documents, right, so not just Microsoft Office but all of your typical office documents of any format like PDFs and on and on. Those are fully supported and supported with full-content search and everything. As you noted, a movie file, hard to do a full-content search on that. So we can still have all the metadata around it and the management around that. So this is one of the ones that if somebody has some particular kind of files that they’re looking, they should just keep checking back with us on whether or not we support it because we’re continuing to add support for new file types.

Enrico: Okay. And another question is, “Are there preset classifications included? And if so, what are they?” I mean, probably they are asking if there are rules and policies already in place to classify your documents.

Victoria: Yeah. And well, one of the things we’ve done is we’ve made it super easy for clients to add classification rules. So it’s just a couple of clicks to add a new tag that you can then apply to your files. And that’s a super simple process. So it allows for the greatest amount of flexibility when they’re deploying Aparavi for organizations to implement business policies that are unique to their situation. So we focused on doing that.

Enrico: Okay. We’re already past the hour, but there are so many other questions. To our listeners, I want to assure you that Vicki and I will pass all the questions to the Aparavi team, and they will answer them offline. Some of them are really interesting actually. And to wrap up this webinar, just give our listeners a couple of links. So you can find me on the GigaOm website. This is the easiest way to find me. And from there, you can find my Twitter handle and everything else. I suggest this way because my Twitter handle is my surname. And sometimes people, especially in the U.S., it’s difficult to spell it correctly because it is an Italian name. And so let’s check on the GigaOm website for the recording of this webinar, as well as my writeups, including one specifically around data management where I talk extensively about data management techniques and advantages of data management. About Aparavi: Victoria, can you give us your website, Twitter handles, and where you can find other channels, if any?

Victoria: Yeah. Please visit www.aparavi.com. And follow us on social media down at the bottom of the website, you can just click on it and follow. But if you’re on Twitter right now and you want to check us out, go to @aparavisoftware. We’re also on Instagram, by the way.

Enrico: That’s great. I’m on Instagram too but mostly for gelato pictures. I don’t usually share that for professional and for other things. Okay. So thank you very much again for your time, guys. And thank you to all our listeners. And stay with us for our other webinars, and other communication from GigaOm. Bye-bye.

Victoria: Bye. Thanks, Enrico.

Jon: Bye, everybody.

Leave a Reply

Your email address will not be published. Required fields are marked *

clear formPost comment