Understanding The Data Flood: Why Ontologies Are Critical

Understanding The Data Flood: Why Ontologies Are Critical

Enabling humans and AI to quickly derive truly useful meaning from data requires us to inject context and relational information by linking the data to knowledge models

06-08-2021
Forrest Hare
DATA ANALYTICS

“In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients.” –Herbert Simon

 

Amazing advances in technology and the move toward open standards have enabled us to make great progress in helping government agencies achieve their data integration goals. While data wrangling and cleaning are still necessary steps before integration and analysis, automation is aiding these processes. We can now bring in all sorts of data from many types of sensors, transported via diverse protocols and conforming with many data standards, to a single and integrated system for storage, analysis, and display.

However, this integration of data into a centralized location does not necessarily result in better understanding of it. Data dominance does not necessarily equal decision superiority. In fact, there is now a greater risk of encountering the situation that American economist and scientist Herbert Simon described in his insightful quote several decades ago.

We now have a simple term for this situation: “drowning in data” or even “data paralysis.” This can be a real and dangerous thing. But it is not an inevitable destination if we follow a smart path to data integration.

The limitations of traditional data modeling

Once we start to drown in our data, the first response is figuring out better visualization of it all. "Give me dashboards so I can visualize my data," our customers may say.

Often, dashboard creation requires additional data wrangling and reformatting, which can be done fairly quickly. But even with a collection of dashboards, you may just be presented with a bunch of numbers transformed into lines, colors, and polygons. You can certainly derive knowledge from visual representations in some cases, but you are confronted with more data in more formats in all cases.

Once we determine that more graphs, bars, and colors are helpful in some cases but not sufficient to fully understand what we really need to know for decision-making, the next step is calling in a data scientist to wrangle with the data some more, do some analysis, and provide answers to specific questions that simple visualization did not deliver. But this requires you to call the data scientist over and over again whenever you have another question.

What often slows down the data scientist’s responsiveness is the need to relate the data to real-world concepts. Data scientists oftentimes are not domain experts, so they must work with subject matter experts to understand each data element and how the elements relate to each other before they can develop the right set of analytics. With traditional modeling techniques, this level of understanding cannot be obtained by simply looking at the model.

The next attempt to achieve more rapid sense-making of data is using artificial intelligence (AI) and machine learning (ML) tools. The hope is that the computer can help us make sense of everything and do it at the “speed of decision-making.” But in taking this step toward AI/ML sense-making and decision support, it is important to give the machine clues regarding the true meaning of all the data. From a data model, the machine will understand the meaning of the data no better than the data scientist. Most important, meaning from data should be derived from human understanding or from our shared mental world, with that knowledge passed on to the machine.

Without attaching a mission-based knowledge model to a data label, such as "tiger," a computer only knows how to label tiger pictures from data and not much else.

Giving meaning to incoming data

The best way to do this is through the deployment of ontologies, which are developed to represent the shared understanding of the domain of interest in a logical way. This formalized, machine-readable ontology is what we call a knowledge model. AI can be employed to try and make sense of data, but without being calibrated with this knowledge model, it will then be learning on its own and just memorizing what you tell it to memorize.

For example, you can train a neural network-based algorithm to detect representations of cats in images. However, all you are really doing is telling the computer to find a similar pattern of pixels in an image, look for images in which that pattern appears, and then label them with a random string of three letters starting with “c.”

The computer does not understand what the three-letter string really signifies. So, it can infer nothing whenever it finds a cat in a picture. But introducing the label “cat” and attaching an ontology to it (e.g., it is an animal, it has fur, it is feline, etc.), now you can ask the machine questions, such as, “Did you find any pictures of animals?” and it can show you all the cat pictures.

More important, you can program the machine with questions like, “Alert me when you find any threats in the pictures,” and it can tell you each time it finds a tiger in an image. This is because it will know what a tiger is, based on the ontology, which provides sufficient information on the dangerous features of tigers (e.g., it has large, sharp claws and teeth and is much more powerful and potentially harmful than a cat, etc.) and equate them to being potential threats. Now we are using the computer for decision support.

At this point, we can keep the wealth of incoming information from creating a poverty of attention or an overload of fluffy kitten pictures. Instead, the wealth of information will lead directly to a rapidly created wealth of actionable knowledge, thereby allowing us to quickly achieve decision superiority.

Posted by: Forrest Hare

Cyber Operations / Solutions Developer

Forrest Hare works in the cyber practice within SAIC’s Strategy, Growth, and Innovation group, developing and implementing solutions for both cybersecurity and knowledge modeling for federal government customers. One of his primary focuses is on developing machine-readable, semantically computable knowledge models that integrate operations in all defense domains, including air, land, sea, space, and all components of cyberspace, such as the electromagnetic spectrum. He develops ontology-based knowledge models for defense intelligence to improve intelligence information for all-source analysis.

Hare joined SAIC after retiring as a colonel in the U.S Air Force. His last assignment was deputy center chief at the Defense Intelligence Agency’s Asia/Pacific Intelligence Center. Over his 28-year career in the Air Force, Hare had assignments in targeting, signals intelligence, information operations, and cybersecurity policy. While assigned to the Air Force headquarters staff, he was a member of the Air Force Chief of Staff’s cyberspace task force, which defined the service’s role in the cyberspace warfighting domain.

Hare, a Ph.D., is an adjunct professor at George Mason University and Georgetown University, where he instructs on security and technology, intelligence operations, and national security policy for cyberspace. He is also a member of the Open Cybersecurity Alliance’s Project Governing Board, which promotes open standards for cybersecurity products. Hare is a Certified Information Systems Security Professional.

Hare earned his bachelor’s degree in geography and economics from the U.S. Air Force Academy, his master’s degree in geography from the University of Illinois Urbana-Champaign, and his doctorate degree in public policy from George Mason University. He lives with his wife and dog in northern Virginia most of the time as well as in “ski-country” Colorado. He practices and instructs aikido and enjoys triathlons when there’s no snow.

Read other blog posts from Forrest Hare >

< Return to Blogs