The most common mistakes in AI projects

Karol Woźniak
AI Manufacturing

Industrial AI doesn’t fail because of algorithms. It fails because of people and processes. 

If an AI project in a factory “didn’t deliver value,” it is almost certainly not because the model was too weak. It failed earlier – at the stage of defining the goal, accountability, data, and operational decisions. 

In manufacturing, AI is not a research experiment. It is a tool – typically a machine learning model – that affects cost, safety, and continuity of operations. And yet many organizations treat it like a technology demo or a presentation for the board. Below are ten reasons why AI projects in industry end in silence after the pilot – and why it’s not AI’s fault. 

1) “We are implementing AI” is not an operational goal

The project starts with the declaration “we are implementing AI.” But that is not a goal – it’s a slogan. Without identifying a specific business problem – such as reducing unplanned downtime through predictive maintenance or cutting scrap with real-time quality analytics – the project will almost always deliver something: a dashboard, a model, a chart. But it will not deliver decisions. AI without a clearly assigned operational objective is art for art’s sake. 

What to do:

  • Select one event cost to reduce: critical downtime, scrap, power exceedances, energy/utility losses, safety incidents. 
  • Define 2–3 success metrics: MTTD (Mean Time to Detect) /MTTR (Mean Time To Repair), number of unplanned stoppages, energy cost per batch, number of interventions triggered “on signal.” 
  • Add an adoption metric: who uses the output, when, and in which decision-making process. 

2) The result exists, but no one is responsible for it

The model detects an anomaly, the system generates an alert, the dashboard lights up red. And nothing happens. Because unlike machines, AI outputs don’t have a natural owner. If they are not embedded in an operational procedure, they remain “next to the process.” In a factory, everything has an owner. If AI doesn’t have one – it doesn’t exist. 

What to do:

  • Appoint a Product Owner in operations (production/maintenance/energy/quality), not in IT. 
  • Define a procedure: signal → assessment → action → confirmation of effect. 
  • Introduce a feedback process as part of the workflow (signal accuracy, cause, result). 

3) OT data is treated as the unquestionable source of truth

“We have data in the Historian, so we can train a model.” This approach is wrong. You have signals, not data. In a production environment, operating modes change, tags disappear or change meaning, and startups and changeovers often look like failures. Without proper data contextualization, a model cannot distinguish normal from abnormal. A model trained on an undescribed context is not intelligent – it is random. 

What to do:

  • Approve a “data preparation standard”: list of signals, definitions, time windows, description of operating modes (RUN/STOP, startups, changeovers), and data quality rules. Identify the data pipeline from source (SCADA/PLC/historian) to model input – gaps in this pipeline are where most projects break. 
  • Identify the owner of process data (who is responsible for the meaning of signals and tag changes). 
  • Plan data quality monitoring as part of the project scope. 

4) The pilot works because everyone rescues it manually

A pilot often “works” because someone fixes the data, someone interprets the result, someone remembers that “this needs to be checked.” Scaling from proof of concept to production requires standards: an object model, KPI definitions, change management, and operational safety. If the solution is not repeatable, standardized, and maintainable – it is not a solution, only a presentation. 

What to do:

  • Plan “what happens after the pilot”: next assets, next lines, subsequent use cases. 
  • Enforce repeatability: KPI, alarm, and report templates instead of designing from scratch every time. 
  • Define who maintains the solution after the pilot (team, budget, SLA/support mode). 

5) The organization tries to do everything at once

Implementing OT/IT integration, a data platform, machine learning models, and a user application simultaneously often overwhelms the organization and increases the risk of delays. 

What to do:

  • Introduce phasing: monitoring/KPIs → alarms and deviations → scenarios → models → automation. 
  • Define a “value moment” per phase (e.g., reduce diagnosis time by X% in 6–8 weeks). 
  • Allow integration and data to stabilize before intensive modeling. 

6) The model works… until the first process change

Production processes evolve: recipes, raw materials, setpoints, upgrades. A model without care suffers from model drift – it loses accuracy as conditions change, and users’ trust drops. 

What to do:

  • Prepare an MLOps maintenance plan: monitoring prediction quality, reviews, retraining triggers, and rollback procedures. 
  • Define responsibility for approving new model versions and a rollback procedure. 
  • Treat models as part of production configuration: version control, audit trail, and change management – just like any other production asset. 

7) The operator learns about AI during training

In shift-based environments, simplicity, unambiguous messages, and a predictable response matter. Without that, the solution may be perceived as additional burden. If the frontline operator sees AI for the first time on a slide, they don’t trust thresholds, alarms, or recommendations. And in shift operations, lack of trust means one thing: ignoring signals. AI that doesn’t fit the rhythm of the shift won’t make it into day-to-day work. 

What to do:

  • Invite operators and maintenance from the start to define symptoms, thresholds, and priorities. 
  • Create a standard operational message: asset, urgency, justification, recommended steps, escalation rules. 
  • Plan training based on real cases and incidents, not general presentations. 

8) The “black box” kills accountability

When a system influences cost and safety decisions, the organization needs not only the outcome but also justification and history: what the premises were, what was done, and what the effect was. Without this, it is hard to sustain a standard of work and hard to defend decisions in an audit. 

What to do:

  • Require a minimum level of explainability: which symptoms and trends contributed to the signal. 
  • Enforce a decision trail: signal time, actions, result, conclusion. 
  • For critical areas, use a hybrid approach: rules + condition-based maintenance (CBM) / reliability-centered maintenance (RCM) thresholds + machine learning. 

9) Cybersecurity is treated as a formality

AI in a factory creates new data flows and OT/IT integrations – and with them, new cybersecurity attack surfaces. Without access rules, auditing, and operational readiness, the project is blocked by security – or it becomes a risk. 

What to do:

  • Approve a zone architecture and controlled data-exchange points. 
  • Require RBAC/SSO, separation of privileges, and audit of changes to KPIs/rules/models. 
  • Ensure logs, backups and recovery, and an update process as part of the solution. 

10) There is no knowledge, so the signal changes nothing

A signal without a standardized response does not organize work. The greatest value comes from connecting data with domain expertise codified in procedures: checklists, standard operating procedures, HSE requirements, escalation standards, and lessons learned from past incidents. 

What to do: 

  • Build a knowledge base linked to assets and events. 
  • Introduce a continuous improvement loop: symptom → root cause analysis → action → effect → lessons learned. 
  • Where it makes sense, enable fast access to knowledge based on internal resources and permission control.
Light mode