“In an information-rich world, the wealth of information means a dearth of something else: a scarcity of whatever it is that information consumes. What information consumes is rather obvious: it consumes the attention of its recipients.” –Herbert Simon
Amazing advances in technology and the move toward open standards have enabled us to make great progress in helping government agencies achieve their data integration goals. While data wrangling and cleaning are still necessary steps before integration and analysis, automation is aiding these processes. We can now bring in all sorts of data from many types of sensors, transported via diverse protocols and conforming with many data standards, to a single and integrated system for storage, analysis, and display.
However, this integration of data into a centralized location does not necessarily result in better understanding of it. Data dominance does not necessarily equal decision superiority. In fact, there is now a greater risk of encountering the situation that American economist and scientist Herbert Simon described in his insightful quote several decades ago.
We now have a simple term for this situation: “drowning in data” or even “data paralysis.” This can be a real and dangerous thing. But it is not an inevitable destination if we follow a smart path to data integration.
The limitations of traditional data modeling
Once we start to drown in our data, the first response is figuring out better visualization of it all. "Give me dashboards so I can visualize my data," our customers may say.
Often, dashboard creation requires additional data wrangling and reformatting, which can be done fairly quickly. But even with a collection of dashboards, you may just be presented with a bunch of numbers transformed into lines, colors, and polygons. You can certainly derive knowledge from visual representations in some cases, but you are confronted with more data in more formats in all cases.
Once we determine that more graphs, bars, and colors are helpful in some cases but not sufficient to fully understand what we really need to know for decision-making, the next step is calling in a data scientist to wrangle with the data some more, do some analysis, and provide answers to specific questions that simple visualization did not deliver. But this requires you to call the data scientist over and over again whenever you have another question.
What often slows down the data scientist’s responsiveness is the need to relate the data to real-world concepts. Data scientists oftentimes are not domain experts, so they must work with subject matter experts to understand each data element and how the elements relate to each other before they can develop the right set of analytics. With traditional modeling techniques, this level of understanding cannot be obtained by simply looking at the model.
The next attempt to achieve more rapid sense-making of data is using artificial intelligence (AI) and machine learning (ML) tools. The hope is that the computer can help us make sense of everything and do it at the “speed of decision-making.” But in taking this step toward AI/ML sense-making and decision support, it is important to give the machine clues regarding the true meaning of all the data. From a data model, the machine will understand the meaning of the data no better than the data scientist. Most important, meaning from data should be derived from human understanding or from our shared mental world, with that knowledge passed on to the machine.