Understanding unstructured data

I’ve just returned from Europe, where I met with many fellow business leaders. Before that, my team and I met dozens of North American execs at the Midsize Enterprise Summit in Orlando, and before that dozens more at XChange in Las Vegas. Wherever we go, we have begun to see a pattern, and it’s one that concerns me.

Most business leaders have no idea how much unstructured data their company manages and what this is supposed to mean.

Although we’ve mentioned it a time or two on this blog (here, for example), it’s generally been addressed to IT pros, who already understand unstructured data. But do they recognize the strategic impact? Almost all C-level executives I speak to are in the dark about both what it is and why it’s important.

I want to change this! It is about the future of many firms, but it still gets neglected.

Why is it even necessary to understand unstructured data?

“We have better things to do. We trust our teams to perform these functions. We endeavor to find the best people, and we motivate those people to continue being the best. If we’ve built the proper team, we think, then we don’t need to understand this.”

This assumption from many business leaders couldn’t be more wrong because data is the gold of any firm. Data is still highly underutilized in most organizations, even though everybody knows by now that Google, Facebook and many others are so wealthy and powerful because they control data. Do you want to leave that power in their hands forever? Do you want to be the firm that will be just the back-end that serves what others leave on the plate for you?

This is very strategic and you can NOT expect anybody else to initiate this critical mind shift. You need the right plan, the best people and the best tools, resources, and training. It is a business leader’s responsibility to plan the company’s future, especially its financial future. Failing to understand this very core of information systems impedes your ability to plan. It is not on your IT people to understand this and in days with tight IT budgets do not expect your IT folks to look beyond their short-term objectives.

Structured vs. unstructured data

Structured data is typically data confined to a database, data that is quite locked into its application. Structured data is rarely ever separated from, or accessed without, the application that uses it. A database file does you little good if you don’t have the appropriate database software to go with it.

Semi-structured data is a curveball – the best example of which is your email system. The application (likely Outlook) organizes the data in defined and hierarchical ways, but the data itself, the messages, attachments, and contacts, can be pulled out of the application and accessed without much trouble. If you ever save an email message as a text file, it’s no longer semi-structured; it’s now unstructured data.

Unstructured data is the majority of your office files: your documents, spreadsheets, images, videos, audio files, PDFs, and just about everything else. Increasingly due to IoT and other initiatives, unstructured data is made up of more than just office documents: satellite imagery, scientific data, digital surveillance, and sensor data to name a few.

Unstructured data volume Your total data volume, of course, is all data combined. Occasionally I ask a CEO how much data they have total, and I give credit to those who can answer accurately. But except for a few, they do not know unstructured data volumes. Do you?

What shocks me is that even among the IT specialists there is a huge ignorance of how to deal with unstructured data and what value it represents. There is so much saving potential AND business intelligence in it. Why is so much unstructured data sitting mostly untouched and unused on very expensive primary storage? Why is it fail-overed and backed-up when instead it could easily be sitting on secondary cheaper storage (onsite or in the cloud), indexed and well prepared for any future needs?

Here’s a hint…

If your business is like most, unstructured data will be the majority of your data in the future. What’s most important is that load is growing, year over year, at a rate of 60 percent or more, far faster than structured data growth. You need to start controlling this now and prepare data for future use. Remember when Google started and we all thought it was just about search? Not working to tap the value of data was understandable 15 years ago, but if you are neglecting it today you are missing a huge opportunity.

Many industries are becoming dependent on Internet of Things (IoT) technology. There is an exploding landscape of IoT devices, smart devices, mobile devices, industrial machines, satellites, cameras, collecting, generating, and transmitting unstructured data, and the result is…more data. And more value. But you need to manage it right and you must understand the value. And you must start implementing the right solution now.

Plus there are brand-new classes of business analytics tools, artificial intelligence/machine learning applications, and other ways data is processed, evaluated, or audited so we can derive actionable insight from it and tap it for value. To take advantage of these tools, we need to keep even our inactive files for long periods, like we do for regulatory or legal reasons. We’re throwing almost nothing away.

What will this cost?

It costs you very little to get together with your head of IT and ask questions. Here are a few suggestions:

  • How much of our data is structured and how much is unstructured?
  • How much is the unstructured data load growing?
  • What kind of data is responsible for the growth, and/or which applications are responsible?
  • How much of the unstructured data is active, needed for day-to-day operations, versus inactive, only needed for reference or historical purposes?
  • Are we adequately equipped for next year’s growth, and the year after?
  • Are we prepared for growth given the resources–human and otherwise–we currently have?
  • Are there ways we could be managing unstructured data better?
  • Are there ways we could be tapping into unstructured data for trends, forecasting, or other purposes as benefits the company?

If you’re a caring, forward-looking leader, I hope this has been informative, and if you’re an IT pro whose CEO needs to see this, there are some handy share buttons below. If you want to discuss further, I welcome the conversation. adrian.knapp@aparavi.com