High Performance Data Analytics: 'Ultimate' Q&As

By Rajiv Bendale, Technical Director, HPC Programs

In Douglas Adams' The Hitchhiker's Guide to the Galaxy, a supercomputer takes several million years to determine the answer to the "Ultimate Question of Life, the Universe, and Everything." The answer, in the novel, is the number 42.  

Around the turn of this century, scientists began exploring genomics (the study of genetic material) and biological computation. Traditional datasets suddenly exploded, and the term “Big Data” was born. Fast forward to today, and analyzing so much data and extracting insight and knowledge continues to be a very real problem. This challenge led to an entirely new discipline: high performance data analytics (HPDA).

To be fair, the CERN folks, working on the Large Hadron Collider, were churning out 30 petabytes of data per year even earlier. Their techniques for handling and analyzing that much data (e.g., data staging and refining data analytics) have considerably influenced numerous disciplines. In the case of data generated by genomic sequencers, the scale caught practitioners by surprise and caused them to adopt new methods to handle and analyze genomic data for genome assembly, gene expression, protein structure determination, and protein-to-protein interaction. This involved building the high performance computing (HPC) infrastructure to address each stage of this data explosion. Research labs and academic institutions, as well as private companies, quickly assembled teams of specialists to merge traditional techniques with a newer computational approach.

Bioinformatics (an interdisciplinary field that uses supercomputers), HPC, and other computational methods to manage biological data sprang from the need to manage and make sense of this data. Genomics and a huge variety of other scientific areas of study still struggle with the problem of extracting information from data—and, ultimately, knowledge from the extracted information. The field has broadened significantly with the use of social media and smart devices. Social data mining has become an enormous business, fed by information mining and data fusion from various disparate sources to build up profiles.

FURTHER READING: High performance computing greatly expands and accelerates the search for novel materials

 

We're going to need a bigger computer

Data analytics practitioners, when faced with extreme data size or complexity, now look to HPC to reuse well-established techniques that target parallelism and scalability to conquer problems at scales associated with grand challenges. These fundamental problems in science and engineering can’t yet be tackled, even with state-of-the-art computing tools, as they require enormous quantities of CPU time. Application of HPC techniques to tackle the solution of such problems will have considerable economic and scientific impact.

Many of SAIC’s technically savvy customers are looking to us for a clearer understanding of how to apply data analytics, machine learning, and artificial intelligence techniques to data of interest. SAIC is helping to address these requirements by enabling the creation of data analytics platforms that combine the use of databases, parallel file systems, AI/ML techniques, statistical programming languages, and tools on an HPC platform with easy-to-use graphical user interfaces (GUIs). SAIC also provides training to new users on HPDA awareness and applications, enabling them to break into this new field.

Now what?

The crystal ball is slightly cloudy at this stage, and the HPC industry is now exploring how best to address HPDA. Right now, SAIC is studying AI-based approaches that apply known techniques to scientific data analysis. These could automatically examine scientific data and contextually extract information from results of intense computations, ultimately synthesizing knowledge from the extracted information and guiding researchers and practitioners in designing better experiments or refining and significantly enhancing products. SAIC’s Synthetic Analyst platform, when tied into HPC resources, could provide human-grade analytic workflows while dramatically lowering training, integration, and other costs. The application potential for these tools is exciting, and it opens up a new world of possibilities.

FURTHER READING: Doing difficult data prep work pays off with analytics efficiency

 

About the author: Dr. Rajiv Bendale is the technical director for the company's HPC customer programs. He has served in leadership and strategy roles on SAIC programs with DoD, NOAA, and FDA. He is currently focused on performance enhancement, code optimization, extreme-scale computing, and multi-physics simulation. He has also led requirements identification tasks for NOAA, which resulted in major technology refresh efforts.