Breaking Barriers to Entry to Advanced Analytics

08-11-2022

 

SAIC’s Integrated Data Science Suite, featuring Tenjin and Koverse Data Platform, addresses critical data, risk, people and tech problems in accelerating agencies’ paths to achieving AI and machine-learning data analysis

 

Advanced analytics has quickly gone from what was seen as an opportunity to being a much-needed capability for government agencies. Agency leaders realize they need automation, artificial intelligence and machine learning to assist their human workforces in extracting mission insights from the petabytes to exabytes of data being generated. However, challenges from implementation complexity and cost, to risk and security, to rapid technology changes are hindering them from making AI transformations.

SAIC is responding with a modular set of barrier-breaking solutions called the SAIC Integrated Data Science Suite to help government customers overcome these obstacles. From data collection and management to AI operational deployment, each innovative solution in our suite tackles the biggest data science problems agencies are facing, and we are already rolling out the solutions to civilian agency and Department of Defense customers. We can also deploy the suite as a fully integrated package covering the entire analytics lifecycle for customers.

Wrangling the data herd

Ingesting disparate types of data from a plethora of sources and managing them in one place, even before data analysis can begin, has perplexed government organizations. Particularly for military, defense and intelligence agencies, the need to compile and index a mix of unclassified, secret and top secret data complicates advanced analytics endeavors, and the proliferation of unstructured data, meanwhile, is heightening complexity problems for everyone.

In April 2021, SAIC acquired Koverse and added its groundbreaking, commercial-off-the-shelf (COTS) data management platform to our Integrated Data Science Suite. The Koverse Data Platform (KDP) ingests, indexes, stores and secures any type of structured or unstructured data file in its native format and regardless of security classification into one repository. KDP, whose origins began at the National Security Agency, does this at scale and tags all ingested data with attribute information that determines the set of preconditions for their accessibility.

KDP then applies access control across multiple-level users of this single data repository in a zero-trust way, so that they can only see and touch the data that matches their individual clearance level and permission attributes. This attribute-based, data-centric approach to segmentation and security can parse information within a data file, such as a PowerPoint deck of slides with different security classifications, making "who can see what" much more granular and precise than approaches based on roles and policies.

KDP’s ability to logically separate and protect mixed sensitivities allows safe commingling of data within a singular information domain and broadens the number of data sources an organization can leverage. When agencies can physically co-locate all of their data, not only does it break down data silos and eliminate multiple databases and the need for cross-domain solutions, it also creates a single source of truth from which they can build their analytic models, assuring integrity and governance.

“KDP is a data management solution that crosses compartments, enclaves and sensitivities, which the government has been trying to solve for years,” said Jay Meil, SAIC’s chief data scientist and AI strategy and solutions lead. “You can’t get to AI until you solve this problem of bringing your data together first. That’s what makes KDP incredibly important.”

Attacking resource, operational bottlenecks

Government agencies are facing a shortage of data science professionals, who can code algorithms and build, train and operationalize analytic models for field use. At the same time, the blistering pace of incoming data needing to be analyzed is severely taxing human abilities and pressing upon the need for AI and machine help.

Seeing these problems afflicting our customers, we responded by launching Tenjin, a low/no-code AI and machine-learning development and orchestration tool as our other core solution in the SAIC Integrated Data Science Suite. In Tenjin, analytics professionals and those without data science or AI engineering backgrounds alike can operate and interact with analytic models via a drag-and-drop, point-and-click environment. The solution comes with ready-made, reusable data preparation components and recipes for non-experts to engage in analytic model development, while seasoned data scientists and engineers have a full-code environment to do custom development of models and applications.

By empowering “citizen data scientists,” Tenjin democratizes advanced analytics for agencies experiencing people and resource constraints. A mission expert who doesn't have data science proficiency but sees an analytics need can go into Tenjin and develop models, perform exploratory analysis, and build visualizations to rapidly gain decision-enabling insights. Agencies benefit from better organizational agility and alignment, with technical and non-technical practitioners working together within a unified platform and sharing a common operating picture.

With the skills shortage, “it’s extremely hard to find data scientists and machine learning engineers,” said Meil, adding that for DOD organizations, “it’s even harder to find cleared data scientists and ML engineers. And they’re not going to be afloat on a Navy ship, for example, to operate models. What Tenjin does is take all the advantages of a data science suite and turns it into an orchestration tool that anyone can operate.”

Tenjin is based on the commercial Dataiku platform. Partnering with its developer, we added additional security features authorizing government use and AI algorithms that are geared toward government missions in the areas of computer vision, natural language processing and data fusion. We call these ready-to-use algorithms Mission Accelerators, and they come prepackaged in Tenjin so agencies can operationalize them quickly.

For example, we can deliver Tenjin with computer vision Mission Accelerators to do facial recognition supporting identity verification. Users can tailor these to their needs or deploy the AI models as-is into agency biometrics applications. We design the models as loose collections of individual functionality components so users can mix and reuse them in a plug-and-play style for other analytic applications.

While speed in the AI-building process is crucial, agencies still need to know that their models are trustworthy. Very few AI models can go into operational use because of the lack of "explainability." With concerns like AI bias, model explainability is crucial to the success and progress of advanced analytics in government.

By helping agencies to understand how their models are behaving and prevent AI from making potentially harmful decisions, they can manage AI risk better. Equipped with documentation and activity-monitoring tools that trace a project’s steps, and visualization tools that show how AI models are drawing inferences from data, Tenjin allows agency stakeholders in governance and compliance, science and engineering, and quality to audit the AI development chain of events from the gathering of data to the final model outputs.

“In model explainability, they can see a visual representation of what data was ingested and from where, and how the model was trained and tested with the training dataset,” said Meil, adding that Tenjin lets users drill down and interact with each piece of that chain if they want to know more.

Solving the systems integration puzzle

SAIC makes Tenjin, the Mission Accelerators and Koverse Data Platform available as individual, open architecture COTS offerings, allowing agencies to integrate them into their existing systems and data lakes and also avoid the vendor lock caused by proprietary solutions.

“If a customer has already spent millions of dollars on their infrastructure, SAIC won’t tear everything out and start over,” said Meil. “Our open architecture solutions run on APIs and connectors, meaning they can interoperate with existing or additional applications. We work with customers to phase them into their systems.”

For a customer that is ready for our complete Integrated Data Science Suite, with all of our solutions integrated into a turnkey advanced analytics capability, we containerize and deliver it as a full analytics lifecycle environment with connectors and plugins to run within the customer’s infrastructure on premise or in the cloud.

After evaluating a customer’s current technology state and understanding its analytics problem sets, we can quickly configure a tailored solution. Not only does our Integrated Data Science Suite offer customers a rapid and scalable path for AI-driven analytics, its open architecture makes it future-proof by allowing connections with new solutions as technologies evolve.