Robotic Process Automation Can Lead to Data Analytics of PowerPoint Files

Robotic Process Automation Can Lead to Data Analytics of PowerPoint Files

We used RPA to help us analyze a bunch of the ubiquitous documents. This is what happened

Charles Onstott

Microsoft PowerPoint is used by all kinds of organizations to share information about plans, projects, strategies, and status updates, among many kinds of content. Organizations often create standard PowerPoint templates for their people to use in creating the content. PowerPoint is convenient in that someone can input data and it's immediately available for presentation purposes.

As a result, vast amounts of corporate knowledge lie in PowerPoint documents. While in some cases the documents can be searched easily, comparing data across documents in meaningful ways is a very tedious and time-intensive task. If a company has 50 project plans in PowerPoint documents created from the same template, it is not easy to determine, say, the average duration of a project or how many projects are over budget.

As chief technology officer of SAIC, I wanted to analyze data in PowerPoint documents that laid out business growth plans over multiple time horizons. The documents came from a common PowerPoint template and included text descriptions as well as numerical information, such as market size, growth rates, expected revenues and margins, and so forth. The numerical information was conveyed in text, tables, and bar charts.

Enter robotic process automation

Seeking to avoid spending a weekend tediously copying and pasting data across the documents, I wanted to find out whether robotic process automation (RPA) could be used to effectively automate this process. If so, perhaps RPA could be used to help unlock the knowledge contained in other kinds of PowerPoint documents. The proof-of-concept was to use RPA to extract the data to a Microsoft Excel spreadsheet and use that to perform data analytics.

An engineer at SAIC created an RPA bot in UiPath, an RPA software platform. We discovered that people in our company were pretty consistent in populating our template-based documents correctly and did not modify the templates or add unexpected charts. So, the RPA bot worked well.

The RPA engineer then sent the database to one of our data scientists. Looking through the data, the data scientist immediately ran into challenges with style inconsistencies in the data between the documents. For example, people wrote “$3,000,000,” “$3M,” or “$3,000 (in thousands).” This was not unexpected since PowerPoint doesn’t have data validation, so without guidelines any group of people was likely to be inconsistent with inputs when using a standard form.

The style inconsistencies between documents had to be rectified in the data wrangling process. And as is the case in many data analytics projects, data wrangling was the most time-consuming activity, taking longer than the bot development and execution. Once the data scientist standardized all the data and used Jupyter Notebook for the analysis, he was able to generate different views of the data, including Monte Carlo analysis to predict ranges of expected financial outcomes for the business projects. This proved very useful to me in my decision-making.


Charles Onstott Robotic Process Automation blog_body 1


Word challenge

The data scientist ran into even larger problems with the text data. Given that PowerPoint is a presentation medium, people inputted data in ways that made the most sense to them for presentation. This led to variances in the text data, and the data scientist would have had to interview the people who made the inputs in order to develop a taxonomy or an ontology that standardized the data. This was not done, since it would have made the proof-of-concept a more expensive and time-consuming activity.

Nevertheless, the data scientist was able to perform basic word frequency analysis on the narrative inputs and construct world clouds to identify themes. Moreover, he did a linear discriminant analysis (LDA) to segment the data into topics with associated words. I found this helpful in identifying general trends and patterns.

The proof-of-concept team got together for a hot wash and determined that RPA could be used to effectively mine knowledge from PowerPoint documents created from the same template so long as the document creators were consistent in how they populated them. Any organization that wants to attempt this activity should expect to do a significant data wrangling effort in order to maximize the value of the extracted data.


Vast amounts of corporate knowledge lie in PowerPoint documents. With robotic process automation and a willingness for data wrangling work, organizations can mine them for decision-making insights.

The team also concluded that data collection would have been made more efficient by using a web or mobile app instead of PowerPoint templates. The app can perform data validation on inputs by users, and RPA or other coding approaches can then populate the PowerPoint documents, ensuring greater consistency and the ability to analyze data across documents while still meeting the goal of presenting complicated information in an easy way.

Front-end work

We suspect that many enterprises are not doing this because the skills required to create PowerPoint templates are simpler than those for writing a web or mobile app. Nevertheless, in cases where a lot of data is being collected over a long period of time, investing in developing an app would yield significant returns in the ability to mine data and see trends, find well- and poorly performing projects, identify risk, and so forth.

Our proof-of-concept demonstrates that RPA can be used to effectively extract corporate knowledge from PowerPoint charts on a wide variety of topics. However, it will almost certainly require a lot of data wrangling work to make data comparable and computable for analysis. Text inputs may require even more standardization work due to different ways people interpret instructions and different ways in which they write.

We recommend that when an enterprise is using PowerPoint to capture a large amount of data from a large number of people, it is better to use a web- or app-based data collection approach to greatly lower the data-analysis cost and increase data usability for decision-making. RPA can then be used to generate PowerPoint documents from data sets for presentation purposes.

With Sergio Rego, data scientist, and Sami Mikhail, RPA engineer



Ontologies for data fusion and reasoning rise with growth in advanced analytics

Getting to real-time object-data tracking from all battlespace domains

Connected battlespace will need strong modeling for data alignment

Posted by: Charles Onstott

Senior VP and Chief Technology Officer

Charles Onstott is senior vice president and chief technology officer of SAIC. In this role, he is responsible for driving the company’s long-term technology strategy, which emphasizes fostering a culture of innovation across SAIC, developing and enhancing innovation capabilities, partnerships with startup accelerators and universities, and research and development.

He also oversees a broad portfolio of technology capabilities, including information technology, training and simulation, advanced analytics, hardware and platform integration, and logistics and supply chain.

Throughout his career, Onstott has been a successful innovator and served in many capacities, including technical leadership, general management, and executive leadership roles. Most recently, he launched SAIC’s Innovation Factory, which enables rapid prototyping, evaluation of emerging technologies, and incremental delivery of SAIC’s services. Innovation Factory marries a U.S.-based national network of innovation centers, startup accelerators, and service delivery centers. Prior to this role, Onstott was senior vice president and service line manager for SAIC’s services in key emerging technologies: cloud, cyber and data science. Onstott lead the charge on developing SAIC’s cyber practice as well as improving its cloud computing business across an array of customer markets and led the development of SAIC’s capabilities in data science and analytics. As a result of his leadership, SAIC significantly grew its capabilities and revenues to new markets and customers.

Onstott joined SAIC in 1994 and has held many roles of increasing authority. Early in his career, he was the chief technologist for SAIC’s Integrated Services Management Center (ISMC), which is SAIC’s most successful IT shared service delivery organization. He was responsible for overall technology leadership as well as program portfolio management. He has developed IT infrastructure solutions that support more than 400,000 users worldwide in multiple industries with a focus on Federal Civilian agencies, the Department of Defense, and health care. He has managed many IT service modernization projects, some of which produced significant cost savings in systems development and maintenance while also improving overall customer satisfaction.

In 2020, Onstott was selected as Public Company Chief Technology Officer of the Year by Washington Executive in 2020.

Onstott holds a master’s degree in philosophy of religion from The University of Chicago and a bachelor’s degree in philosophy from Oklahoma State University. Onstott has long been fascinated by the American experience of living at the intersection of advanced technology, rapid innovation, and a generally religious culture. He has a wonderful wife and two teenagers who challenge his video gaming skills. In his spare time, he is a mountain bike enthusiast who participated in endurance races and completed the Shenandoah 100 mile mountain bike race in 2019.






Read other blog posts from Charles Onstott >

Connect with Charles Onstott: linkedin icon

< Return to Blogs