August 11, 2020

AI Series, Part 1: Enabling Your Data for AI – Start With Data Quality and Avoid the Bad Data Trap

Aditya Sriram

Topic:   Data

Last updated: February 2nd, 2021

Before You Get to KPIs, Clean up Your Data Act

A common pitfall when strategizing an artificial intelligence (AI) initiative is the assessment of data quality and completeness. As AI inches forward to deliver value and forward-visibility into organizations across industry verticals, the quality of data grows in importance to support and deliver credible insights. Poor data quality or incomplete data will often derail and complicate AI initiatives, leading to implausible insights that hinder return on investment (ROI).

As organizations move towards data-driven decisions, it becomes essential to invest in tools that can assist in cleansing and harmonizing data.”

However, what does it mean to have “bad” or “dirty” data in the context of AI? Well, bad data for AI can mean missing fields/records, duplicate records, outdated data, and/or non-standardized data points. As organizations move towards data-driven decisions, it becomes essential to invest in tools that can assist in cleansing and harmonizing data.

To ensure that the health of data is continuous, organizations often adopt data standardization tools that enable data monitoring at the point of entry to centralize control over incoming data. The process of validating the credibility of data before an AI engagement exponentially increases the overall accuracy and adoption of an AI initiative across the organization.

In addition, it is also important to have enough data available to support the underlying AI use case. Organizations have often prioritized an AI use case but they do not have data readily available or the data isn’t in the correct format. If this situation is relatable, then one should either invest in an integration tool to unify the required data sources and ensure the quality of data, or revisit and re-prioritize the underlying use case should. On the contrary, it may also be beneficial to acquire external data or by using a simple AI algorithm.

Aditya Sriram is the WW AI Portfolio Lead & CoE Program Manager at ibi. A PhD candidate in the faculty of Engineering at the University of Waterloo, Aditya Sriram is a member of KIMIA Lab (Laboratory for Knowledge Inference in Medical Image Analysis). He brings extensive understanding of how artificial intelligence moves research and industries forward.


Since 2011, his research activities encompass content-based retrieval of medical images using machine learning, deep learning, and computer vision approaches. He has developed learning schemes and descriptors for medical imaging and published his works in top-tier journals and conferences. Aditya Sriram has a vast industrial experience and has worked with several companies. Presently, he is a senior AI strategist at ibi in Toronto, Ontario, Canada.