How to automate data mapping using AI

Written by

Naresh Venkat

October 7, 2024

Today’s business leaders are confronted with a persistent data dilemma. As the volume of data in businesses continues to surge, traditional data ingestion methods struggle to keep pace. Manual processes don’t scale, and conventional ETL tools consume more engineering time than they save. This leaves businesses with the challenging task of finding a way to ease data ingestion bottlenecks without adding complexity or resources.

When I led AI product partnerships at Google, we saw this time and again. The first step to building analytics and AI is ingesting and wrangling the data into a format that is easily understood by machine learning (ML) models. An inordinate amount of time went into the ingestion and wrangling part. More often than not, it relied on the most qualified and expensive data science and data engineering resources.

With data volume, variety, and velocity growing exponentially, streamlining and accelerating data ingestion while maintaining data quality remains a constant challenge for data leaders.

The key to overcoming these challenges lies in tapping into AI-powered processes that enable true process automation and continuous improvement. Data mapping is a common data ingestion challenge that manifests in the following scenarios:

‍Mapping incoming source data to the required target schema
Mapping categories or lists of values

We could go on for days about why these two seemingly simple tasks are so tedious. Instead, we’ve put together a list of some of the most common data mapping challenges people grapple with when trying to ingest both third-party and internal data.

‍Inconsistent field names: The source data may have non-standard or inconsistent field names that don't directly match the target schema, requiring manual mapping.

Incomplete or missing data: Source data might be incomplete or missing required fields, making it challenging to map correctly, even by a human, let alone programmatically.
Different data formats: Data from different sources may have varying formats (e.g., date formats, numeric vs. text values), which require normalization before mapping.
Ambiguous mappings: Some fields may have ambiguous or unclear purposes, requiring additional clarification or assumptions.
List of values mismatch: Categories or enumerated values from the source may not exactly match those in the target. This can make the mapping process extremely tedious, often involving back and forth with customers.
Typos and data entry errors: Typos or inconsistencies in field names, values, or documentation can cause mismatches and lead to failed mappings.
Frequent changes in schemas: Both source and target schemas can evolve over time, requiring continuous updates to mapping logic.
Human errors: Manual mapping processes can introduce mistakes, leading to inaccuracies in data ingestion.

Many of these problems require a deep semantic understanding of the data, often necessitating human intervention to resolve. Programmatically solving these issues can be challenging, if not impossible - especially in cases like value mapping.

Osmos leverages the power of generative AI to automate the often tedious tasks of column mapping and value mapping. Here's how it works:

AI column mapping (AutoMap)

Mapping incoming data to the standardized fields of the 'golden schema' can be challenging and usually requires assistance from the data team. Osmos empowers the teams receiving the data (the business/analyst teams) to validate, clean, and map it to the golden schema autonomously, giving them the power to streamline data ingestion processes.

We've leveraged Large Language Models (LLMs) to map the source schema to the destination schema. Osmos's automated mapping feature understands the semantics of not just the field names but also the specific values within the fields. Non-explanatory field names, typos, and inconsistencies are no match for Osmos’ Automapping AI.

Here’s a great example of how challenging these tasks can be.

Image of Osmos column mapper interface showcasing how AI can be used to seamlessly transform data. In this example, a 'Drug code' from the source column is being mapped to 'NDC' in the destination column, demonstrating an efficient data transformation process.

A semantic understanding of information allows Osmos to determine that the Drug Code maps to the NDC (which stands for the National Drug Code). To achieve this, Osmos leverages a purpose-built LLM that takes into account several factors, such as field names, data types, data within the fields, and data in other fields, to determine the best possible mapping. A task that previously required industry knowledge or customer interaction is now automated.

AI Value Mapping

Now, with Osmos, you can quickly and easily standardize your data across sources. This Gen-AI agent automatically maps category values to a predefined list, eliminating the need for manual data mapping. That means no more error-filled spreadsheets.

Osmos protects users from AI mishaps by keeping humans in the loop. Easily verify and adjust any output based on what you see, shaving hours off data cleanup tasks.

Get the consistency you expect and the data mapping accuracy you need

As shown in the example below, users are presented with the output of AI’s attempt at mapping values, clearly indicating when a match couldn't be found. Users can quickly identify problems and manually override the system if anything is miscategorized. The AI will adapt and learn from human overrides.

For example, Our Value Mapping AI can accurately map store department types from source data to a list of acceptable department types in the destination schema.

Image of the Osmos value mapping interface leveraging AI to match values in your data automatically. The system maps 'Store Department' entries from various categories such as Electronics, Groceries, and Clothing to standardized fields like Tech, Food, and Fashion, streamlining data integration and transformation processes.

When multiple values from the source data need to be mapped to one category, Osmos intelligently groups them for users to review and make any necessary changes.

Image of the Osmos value mapping interface leveraging AI to automatically group values in source data to the corresponding category value. The system automatically groups values such as ‘Buy One Get One 50%’, ‘Buy Two, Save 15%’, and ‘Buy One Get One 30%’ into the “Buy X, Get Y” category, helping streamline data integration and transformation processes.

Accelerating Data Ingestion with Osmos

The future of data ingestion is here. It’s AI-driven, adaptive, and continuous. Osmos solutions work additively to your existing data infrastructure, allowing you to streamline data ingestion workflows using AI without dismantling your current setup.

Explore our full suite of AI-powered data ingestion solutions and discover how you can accelerate customer onboarding today.

‍

Go From Co-Pilot to Auto-Pilot

Discover our fully-autonomous AI Data Wrangler on Microsoft Fabric

Talk to an expert

Naresh Venkat

Co-founder and COO

What is AI-assisted data transformation?

Written by

Vijay Sarad

Learn about Osmos’s AI-assisted data transformation solutions including fully-automated data cleanup and data mapping.