⚙️ MLOps & AIOps
MLOps Practices

MLOps Practices

MLOps operationalises machine learning. It turns the research-and-notebook world of data science into reliable, governed, production-grade ML systems.


The ML model lifecycle

The ML lifecycle has eight phases. MLOps provides the practices, tools, and automation to move through them reliably:

Problem framing → Data engineering → Experimentation → Model training
    → Evaluation → Deployment → Monitoring → Retraining
PhaseWhat happensMLOps practice
Problem framingDefine the ML task, success metrics, and data requirementsML canvas, feasibility assessment
Data engineeringCollect, clean, validate, and version training dataData pipelines, feature stores, data contracts
ExperimentationTrain candidate models, tune hyperparameters, compare runsExperiment tracking (MLflow, W&B)
EvaluationAssess model performance, fairness, and riskAutomated evaluation gates, model cards
DeploymentRelease model to a serving environmentCI/CD pipeline, canary/shadow deployments
MonitoringTrack model performance, data drift, and concept drift in productionModel observability, dashboards, alerts
RetrainingRetrain model when performance degrades or distribution shiftsTriggered retraining pipelines

Core MLOps practices

1. Experiment tracking

Every model training run should be tracked with full reproducibility:

  • What to track: hyperparameters, dataset version, code commit, training metrics (loss, accuracy, F1), evaluation metrics, runtime
  • Why it matters: without tracking, you cannot reproduce a model, compare experiments, or audit which model was deployed when
  • Tools: MLflow Tracking, Weights & Biases, Neptune, Azure ML Experiments, SageMaker Experiments
Experiment → Run (hyperparams + metrics + artifacts) → Compare → Promote best run

2. Data versioning and feature stores

Models are only as good as their training data. MLOps requires data to be:

  • Versioned: each training run is linked to a specific snapshot of training data
  • Validated: data quality checks run before training (schema validation, null checks, distribution tests)
  • Governed: lineage is tracked from raw source to feature to model

A feature store centralises engineered features so they can be reused across models and teams, and ensures consistency between training and serving (eliminating the training-serving skew problem).

LayerRole
Offline storeHistorical features for training (batch)
Online storeLow-latency features for real-time inference
Feature registryMetadata, ownership, and documentation for each feature

Tools: Feast, Tecton, Vertex AI Feature Store, SageMaker Feature Store, Databricks Feature Engineering.

3. CI/CD for ML

CI/CD in ML goes beyond deploying code — it includes:

  • CI (Continuous Integration): automated data validation, unit tests for feature engineering logic, model training on a subset, quality gate checks
  • CD (Continuous Delivery): build the model artifact, package it, publish to a model registry, deploy to staging
  • CT (Continuous Training): the ML-specific addition — retrain the model when data drifts or a schedule triggers, run evaluation, and promote automatically if quality gates pass
Code commit → Data validation → Train on sample → Evaluate → Gate check
    → Build artifact → Deploy to staging → Integration test → Deploy to production

CT (Continuous Training) is the most impactful MLOps practice for production ML. A model that isn't retrained becomes stale as the real world changes.

4. Model registry and versioning

A model registry is the central catalogue of trained model artifacts. It provides:

  • Versioning: every trained model is given a unique version with its metadata
  • Lifecycle stages: Staging → Production → Archived
  • Audit trail: who promoted which model, when, and based on which evaluation results
  • Rollback: revert to a previous model version if the new one underperforms

Tools: MLflow Model Registry, Vertex AI Model Registry, SageMaker Model Registry, Hugging Face Hub, Azure ML Model Registry.

5. Deployment patterns

PatternDescriptionWhen to use
Online servingReal-time REST/gRPC inference endpointLow-latency applications (fraud detection, recommendations)
Batch scoringRun predictions on a dataset on a scheduleHigh-volume, latency-tolerant workflows (nightly reports)
Streaming inferencePredictions on a data stream (Kafka, Kinesis)Event-driven real-time systems
Canary deploymentRoute a small % of traffic to the new modelSafely validate in production before full rollout
Shadow deploymentNew model receives traffic but predictions are not servedRisk-free production validation
A/B testingSplit traffic between model versions and compare business metricsOptimise for business outcome, not just ML metrics

6. Model monitoring and drift detection

A deployed model degrades over time. MLOps requires continuous monitoring of:

Data drift: the statistical distribution of input features changes from what the model was trained on. Concept drift: the relationship between inputs and the target variable changes (the model's assumptions are no longer valid). Model performance drift: accuracy, precision, recall, or business metrics degrade. Infrastructure metrics: latency, throughput, error rate of the serving endpoint.

Input data → Statistical comparison vs training baseline → Drift score
    → Alert threshold → Trigger retraining pipeline or human review

Tools: Evidently AI, Arize AI, WhyLabs, Fiddler, Azure ML Data Drift, SageMaker Model Monitor.


MLOps maturity levels

LevelDescription
0 — ManualModels built in notebooks, deployed manually (if at all), no monitoring
1 — TrackedExperiments tracked, models versioned, basic CI/CD, some monitoring
2 — AutomatedFull CI/CD/CT pipelines, automated retraining, drift detection, model registry
3 — GovernedFull lineage, fairness checks, explainability, regulatory audit trail, enterprise governance

Most organisations are at Level 0 or 1. Level 2 is the target for teams with models in production. Level 3 is required in regulated industries.


MLOps aligned to ITIL 5 PSLM

PSLM ActivityMLOps role
DiscoverProblem framing, feasibility assessment, data availability review
DesignML canvas, model architecture decisions, feature design, evaluation criteria
AcquireData acquisition, labelling contracts, compute procurement, platform licences
BuildModel training, experiment tracking, hyperparameter tuning, model evaluation
TransitionModel registry promotion, staging validation, canary/shadow deployment
OperateModel serving infrastructure monitoring, latency and throughput tracking
DeliverBatch scoring runs, API access for consuming applications
SupportDrift detection, incident response for model degradation, rollback

Key metrics

MetricWhat it measuresTarget (typical)
Model accuracy / F1 / AUCPredictive performance on held-out test setBaseline + regression test
Training pipeline success rate% of training runs that complete without error≥ 99%
Deployment frequencyHow often new model versions are released to productionWeekly or triggered
MTTR for model incidentsTime from drift detection to model restored or rolled back< 4 hours
Data freshnessAge of the most recent training data snapshotDepends on domain
Inference latency (p99)99th percentile response time for real-time endpoints< 100ms for real-time

Further reading

Digital Kimya — MENA & Europe

Ready to implement what you've read?

Our ITSM practitioners deliver ITIL 4 & 5 projects across ServiceNow, Jira SM, SMAX and BMC Helix — from initial assessment to full ESM deployment.

🚀 ITIL Implementation🔧 ITSM Platform Setup📊 Assessment & Roadmap🏭 Industry-Specific Projects
🌍 MENA & Europe🎯 ITIL 4 & 5 Certified🏢 6 Industries covered Assessment in 2 weeks
contact@digitalkimya.net