June 16, 2026

Where to start with AI when your data is a mess

Perfect data is the most common excuse to never start. Here is how to run a useful first project with the sufficient data you already have.

AI for businessAdoptionData

"We need to organize the data first." I hear that sentence in almost every initial conversation. It sounds responsible, but it is usually the most polite way to postpone forever. The uncomfortable truth is that you almost never need perfect data to start. You need sufficient data and a well-chosen problem.

Perfect data does not exist

Every company thinks its data mess is unique. It is not. Loose spreadsheets, information scattered across systems that do not talk to each other, incomplete history. That is the norm, not the exception. If you wait for the base to become spotless, you will wait forever, because it never gets there. Data organizes itself better when there is a concrete goal pulling the cleanup, not before.

Start from the process, not the data lake

Instead of trying to organize everything, pick a process. Just one. Preferably one that is repetitive, tedious, and already generates some record, even a messy one. Support, document triage, answering internal questions, classifying orders. These processes usually have enough data hidden inside emails, tickets and spreadsheets.

The question is not "is my data ready". It is "does this process generate enough examples for the AI to learn the pattern". Usually the answer is yes, and you find that out by looking at the process, not by auditing the entire database.

A four-step path

  1. Pick a process that is repetitive and low risk.
  2. Gather real examples as they are, without cleaning everything first. A hundred or two hundred cases already say a lot.
  3. Run a small pilot with human review on the exceptions.
  4. Measure against the baseline and decide: scale, adjust or stop.

This cycle teaches you more about your data than months of theoretical organizing. You discover which information is actually missing, because the pilot points to the gaps that matter, instead of you guessing.

Cleanup follows the goal

The most useful side effect of a pilot is that it gives direction to the data cleanup. Instead of "let us organize everything", it becomes "we need to standardize this field, because the model gets it wrong when it comes in empty". That is cleanup with a purpose, and it happens fast because it has a concrete reason and a visible gain on the other side.

What to avoid

Avoid two extremes. One is perfectionism: freezing everything until the base is clean. The other is the opposite, throwing terrible data at a model and expecting magic. The balance point is sufficient data, in the right process, with a human in the loop holding the exceptions while you learn.

Starting small is not a lack of ambition. It is the fastest way to reach something big without burning the budget on the way.

If this kind of conversation fits your operation, this is exactly what I work on. Reach me through the contact button on the home or follow along on Instagram.