March 15, 2021

Metadata and AI: merge data from multiple sources efficiently

Robert Wojciechowski

Topic:   Data

Data integration is incredibly complicated. The sheer volume of data is staggering, but the core challenge of data integration is to effectively and accurately match similar data  coming from disparate sources or labeled slightly differently. That makes categorization and data analysis time-consuming, inefficient, and unwieldy. A key aspect of improving your data quality is finding a way to make metadata work harder for you. Here’s what you need to know about metadata and why it matters to your business.

What is metadata?

Metadata is data that describes data. It’s like a shorthand classification of bigger groups of information. Think of metadata as a reference point. Typical metadata elements could include things such as the title or descriptor for the data. Or, it could include who created the data along with when it was created. Tagging, which is a common feature in many types of databases, is a type of metadata.

Given the many benefits that can be gained from efficiently integrating metadata in a company’s operations, it is no surprise that a recent report projected that the metadata management market sector would grow at an annual rate of 13.7 percent over the next five years.

Here’s an example of how you may use metadata in your business every day. Suppose you have a data sheet with column headers. When you upload information to the sheet, you can use metadata to ensure that the information you merge goes to the correct category on the current data sheet. It’s in this way that metadata can improve data integration and analysis.

While this is a highly visible use for metadata, behind the scenes, metadata impacts software platforms at a functional level. Whether it’s data analysis for business intelligence or finding a better way to convert unstructured data into actionable insight, metadata matters to your business in ways you may not even recognize.

Three steps in the evolution of data intelligence

As the amount of information organizations generate has skyrocketed due to digitalization of many business functions, taking steps to benefit from this data by collecting, sorting, and analyzing it has become a vital function. In doing so, organizations typically progress through the following three steps on the road to maximizing the value of their data collection efforts:

  1. Business Rules: Using business rules enables companies to structure and route their data in useful ways. When an exception to a rule occurs, human input is required to resolve the situation, often by creating a new rule. However, as the volume of data ingested rises, exceptions requiring human input typically also increase, adding inefficiencies to the process.
  2. Artificial intelligence (AI): Computer algorithms can handle massive volumes of data more efficiently than humans processing exceptions, enabling companies to effectively deal with much larger data sets. However, AI metadata classification can’t solve all use cases.
  3. Data intelligence: AI and business rules converge to encompass categories such as data governance, data quality, metadata management, and data profiling using active metadata.

In the data intelligence phase, companies effectively harness business rules and AI to get the most out of their data. You can reach this stage by using a metadata platform that enables you to efficiently access and manage the totality of data sources in your organization, breaking down silos and cutting through the complexity associated with multiple data sources.

Advanced platforms of this type help implement data intelligence by making it easy to use and share its insights widely throughout the organization. They also enhance the quality and thereby the data’s trustworthiness, avoiding the “garbage in garbage out” syndrome that can generate unreliable data in systems that don’t actively manage and integrate metadata.

Metadata displayed on a screen.
Metadata can leverage AI to help even non-technical end users.

Benefits of AI for metadata 

Consider how long it would take a data analyst to manually review 200 columns of data to correctly label each column. An AI-driven platform can perform this function in seconds, oftentimes with less errors than its human counterpart.

An example of this process is the problem of name and first name fields, which can lead to data errors in systems that can’t distinguish the difference between first and last names. AI looks at the underlying data that will inform the classification – and AI can do it on thousands of columns.

Another example is a field for an important piece of data such as a Social Security number. This field could be labeled SSN, Socials, Social, 123-34-5678. All of these column headers could appear and would be accurately categorized by AI.

An AI-driven platform provides you with some distinct benefits:

  • It can save the time and money your data analysts spend cleaning and categorizing data
  • It can reduce errors that are a natural part of manual data management techniques
  • The combination of business rules and AI is more scalable than business rules due to models learning over time. Unlike business rules, the models will learn and adapt iteratively as the data changes. With business rules-driven solutions, you have to make exceptions and keep adding new business rules or logic to handle exceptions over time. It becomes hard to manage and doesn’t scale the way that AI can
  • When human intervention is needed, the machine learning models can make recommendations to the end user to make their decision easier
  • The models learn from past classifications and human inputs and therefore improve classification over time, requiring less and less human interventions and decisions
  • It creates documents that are highly secure quickly, all while increasing efficiency

The key to better business intelligence lies in metadata. ibi, a TIBCO company, offers customers the opportunity to first see our next-level metadata classification capabilities in WebFOCUS, as we continue to weave AI into all aspects of data prep, analysis, and visualization. Request a demo to see our data and analytics platform in action today.

 

Robert Wojciechowski is the principal product manager on the data science team at ibi. His work involves building data science capabilities into a wide spectrum of the data and analytics products that make up the ibi platform. He began his career as an engineer, holding a computer science degree from Princeton University. He has worked on complex data science and engineering projects inside large organizations such as Microsoft and at several successful startups.