Guide to building MLOps - definition, process, comparison, use case

A to Z guide to getting started with MLOps

Sangsun Moon
Guide to building MLOps - definition, process, comparison, use case

In this content, we've summarized what MLOps is, how it's organized, how the process differs at different levels, and what it's used for.

Later, we'll introduce and compare the major MLOps platforms.

What is MLOps?


Machine Learning Operations (MLOps) is a paradigm that aims to deploy and maintain machine learning models reliably and efficiently. It refers to techniques to implement and automate continuous integration, continuous deployment, and continuous learning for machine learning systems.

Defining MLOps
Defining MLOps

MLOps refers to engineering practices that leverage the disciplines of machine learning, software engineering, and data engineering. MLOps는 the gap between development and operations to produce machine learning systems. Essentially, MLOps aim to facilitate the creation of machine learning products, and to that end, they are platforms that perform automation, workflow coordination, and version control. In addition, it performs continuous training and evaluation, metadata tracking and logging, and continuous monitoring and feedback loops to improve models and code.

In other words, MLOps are the Collaboration and communication to accomplish the following goals.

  • Deployment and automation
  • Governance and compliance
  • Scalability
  • Collaboration
  • Monitoring and management
  • Reproducibility


MLOps can be broadly divided into two phases: the ML (learning) phase and the Ops (operations) phase.

ML 단계에서는 Data collection, preprocessing, Model building, training, and evaluation. Ops is responsible for deploying, monitoring, testing, and more of the models completed in the ML phase. MLOps is a concept that includes machine learning engineering, data engineering, cloud, infrastructure, etc.

Building a good model is not the only skill needed to get a machine learning system up and running in production. While a machine learning model is the core of an MLOps system, it's important that all systems, including data and infrastructure, work together from a more macro perspective.

In dealing with MLOps, organizations are faced with several You may run into challenges.

  • It can be difficult to reproduce the results of a machine learning experiment, even when using the same data and code. This is because machine learning models are sensitive to small changes in data or code.
  • To train and evaluate machine learning models, use Large almounts of data are required. Often, we struggle with collecting, cleaning, and preparing this data.
  • Because they often require specialized hardware and software, machine learning models can be difficult to deploy in production.
  • 머신러닝 모델이 Monitoring is required in a production environment to ensure that it is working as expected.

Despite these challenges, MLOps is an important area that can help organizations get the most out of their ML investments. By carefully planning and implementing MLOps, organizations can improve the reproducibility, reliability, and performance of their ML models.


MLOps in Azure with 5 steps
MLOps in Azure with 5 steps

Therefore, every machine learning project should involve the following steps

  1. Data Extraction: Extract relevant data from your data source
  2. Data Analysis: Perform exploratory data analysis (EDA) to understand your data, and understand what data you need for your model.
  3. Data Preparation: Training, validation, and test set segmentation of data
  4. Model Training: Build a trained model by implementing various algorithms, adjusting and applying hyper-parameters.
  5. Model Evaluation: Evaluate the model on a holdout test set
  6. Model Validation: Validate model performance above baseline and ensure it is fit for deployment.
  7. Model Serving
  8. Model Monitoring

If you look at the ML Lifecycle, you'll see that it covers the same things as above, just with different names.

Processes by MLOps Level

Google has a maturity scale from manual application, to automating both ML and CI/CD pipelines, to three levels.

  • MLOps Level 0: Manual Process
  • MLOps Level 1: Automate your ML pipeline
  • MLOps Level 2: CI/CD Pipeline Automation

MLOps LV 0

In the MLOps Level 0 process, all steps are manual, including data analysis and data preparation, model training, and validation. Each step is executed manually, and the transition from one step to the next is also manual. At this stage, ML and operations are disconnected, which often leads to a gap between learning and service. Typically, new model versions are deployed only a couple times a year, a maturity that is not recommended for production environments.

However, MLOps level 0 isn't completely bad - it can be valid for developing and deploying ML models for small projects or internal use. However, it may not be suitable for production environments where reliability and scalability are important. If you realize that you need better MLOps in production, there are a number of resources available to help you grow to a higher maturity MLOps.

MLOps LV 1

MLOps Level 1 means you're ready to automate your machine learning experiments, iterate quickly, and move your entire pipeline into production. This level has several characteristics

  • Experimental-operational symmetry: Pipeline implementations used in development or experimental environments are also used in pre-production and production environments.
  • Modularized code for components and pipelines: The source code for components should be modularized and ideally containerized.
  • In production, ML pipelines continuously feed predictive services to new models trained with new data. The model deployment phase is automated.
  • Level 1 deploys the entire training pipeline, which runs automatically and repeatedly to deliver trained models as a prediction service.

In a nutshell, MLOps Level 1 means automating your machine learning experiments and pipelines to iterate on your experiments. It also means making it easier to train and deploy your models in production.

MLOps LV 2

Description of MLOps Maturity Level LV2
Source: Google Cloud Architecture Center

To quickly and reliably update your pipeline in production, you need an automated CI/CD system. This automated system allows data scientists to quickly bring new ideas for feature extraction, model architecture, and hyperparameters to life.

The Pipeline configuration steps for this MLOps are summarized below.

  • Development and experimentation: Iteratively try out new ML algorithms and modeling techniques. The output of this phase is the source code for the ML pipeline phase, which is pushed to the source repository.
  • Pipeline continuous integration (CI): Builds source code and runs various tests. Aim for pipeline components (packages, executables, and artifacts) to deploy in later phases.
  • Pipeline continuous delivery (CD): Deploys the artifacts created in the CI phase to the target environment, and outputs a pipeline with a new implementation of the model.
  • Auto-triggering: Automatically run pipelines in production on a schedule or in response to triggers.

Once you've deployed your trained model, you'll need to collect statistics on the model's performance based on live data. The results of your monitoring can give you the trigger to run a pipeline or find a new cycle of experiments.

Implementing machine learning in production does not mean deploying models as APIs for prediction. Instead, it means deploying machine learning pipelines to automate the process of retraining and deploying new models. By setting up a continuous integration and continuous delivery system, you can automatically test and deploy new pipeline implementations. At MLOps Level 3, you'll be able to actively respond to rapid changes in your data and business environment.

Compare MLOps

MLOps vs. DevOps

DevOps, DataOps, MLOps Compare
Source: Datacamp

DevOps is a popular approach to developing and operating large-scale software systems. It is aimed at continuous integration and continuous delivery, with benefits such as shorter development cycles, faster deployments, and more reliable releases.

Since the principles of MLOps are themselves derived from DevOps, the two concepts are fundamentally similar. However, the execution is quite different.

  1. MLOps is more experimental than DevOps because machine learning models are constantly being updated and improved, and results need to be reproducible quickly and easily.
  2. MLOps require a hybrid team. Machine learning projects typically involve data scientists, who are experts in data analysis and modeling, and software engineers, who are experts in building and deploying production-grade software because we're in this together.
  3. Machine learning models can be used to Because it's more complex, MLOps require more testing.
  4. Manually deploying and updating machine learning models every time a new version is released is next to impossible, which is why MLOps require an automated deployment process.
  5. ML models in production environments can, over time, Performance may degrade. because the data being trained may change or the model itself may become outdated.
  6. Machine learning models in production require constant monitoring to ensure that the model is performing as expected and to detect problems early.

MLOps vs. LLMOps

MLOps vs. LLMOps

Large Language Model Operations (LLMOps) is an extension of MLOps for training or tuning, deploying, and monitoring large language models. LLMOps is somewhat different from the MLOps approach.

  • Computing resources: Training and tuning large language models typically requires much more computation on large datasets. To speed up this process, you need specialized hardware that can perform faster data parallelism.
  • Transfer learning: Unlike machine learning models that are created or trained from scratch, most large-scale language models start from a base model and are fine-tuning. This process allows us to maximize the performance of certain applications with smaller data and computing resources.
  • Human feedback: Since LLM tasks are very open, feedback from application end-users is a very important metric for evaluating LLM performance. By incorporating this feedback loop into your LLMOps pipeline, you can improve the performance of your large-scale trained language models.
  • Parametric tuning: In traditional machine learning, parametric tuning focuses on improving accuracy or other metrics. In LLM, tuning is aimed at reducing the cost and computing power requirements of training and inference. While both traditional machine learning and LLM can track and optimize the tuning process, they differ in that they target different areas.
  • Performance metrics: While the traditional model had clear performance metrics and was fairly straightforward to calculate, additional implementation considerations apply when evaluating LLMs. For example, there are completely different standard metrics and scores, such as Bilingual Evaluation Understudy (BLEU) and Recall-Oriented Understudy for Gisting Evaluation (ROGUE).

MLOps Use cases

One of the challenges of adopting AI in the enterprise is training machine learning models. Machine learning models are sensitive to data changes and have challenging reproducibility conditions. We've seen models that worked fine at the time of the study fail in real-world commercialization, such as when performance drops. The concept of MLOps is intended to overcome this process. MLOps facilitates model experimentation and development, and connects everything from model development to deployment into a single pipeline.

In this article, we'll introduce use cases under the subheadings of three business values you can get from MLOps.

Purpose 1. Optimize prediction accuracy

Ridesharing Solutions Uber's business is all about estimating a driver's arrival time or location, and then determining the price of a trip based on user demand and driver supply. Uber has built an internal machine learning-as-a-service platform called "Michelangelo" to run its machine learning models. This helps Uber make data-driven decisions.

Uber MLOps Michelangelo Structure
출처: 우버 블로그

  • Online/Offline Forecasting: Implement this mode for models that need to provide real-time/non-real-time forecasts. The forecasting service accepts individual or batch forecast requests from clients for real-time inference. Deployed models can also generate offline/batch forecasts whenever there is a client request, or on a recurring schedule. This is useful for internal business needs as well as real-time services that need to understand continuous data flows.
  • Monitor the performance of your model: Publish metric characteristics and predictive distributions over time so that your team or system can spot anomalies. Participate in observations generated by the data pipeline to observe whether the model's predictions are correct, and measure the accuracy of the model based on the metrics needed to measure the model's performance. We also use our internal data monitoring system, Data Quality Monitor (DQM), to monitor data quality at scale.
  • Iteration and model lifecycle management: Uber managed the lifecycle of models in production and integrated their performance and metrics into the operations team's alerting and monitoring tools. They also included a workflow system to coordinate batch data pipelines, training jobs, batch forecasting jobs, and model deployments to both batch and online containers.
  • Model governance: Includes audit and traceability for data and model lineage. Michelangelo can see the path a model takes through experimentation, track which datasets a model has been trained on, and understand which models have been implemented into production for a particular business case. The platform also has the ability to see the lifetime of a particular model or the various users involved in managing a dataset.
  • It's not hard to find use cases for machine learning at Netflix. Most of what we do at Netflix is based on Personalization revolves around the business need to optimize the user's experience. For example, personalizing a member's homepage, recommending content they might like to see, while also showing them artwork that might be related to each piece of content. The key is to anticipate what the user wants to see before they see it.
  • Similar to Uber, Netflix deploys models in offline as well as online modes. In addition to that, models are deployed in online prediction services, but also in cases where they don't need to perform real-time inference. This makes the system more responsive to customer requests in addition to online prediction services.

Runway - Model Lifecycle Management, MLOps at Netflix
Source: Runway - Model Lifecycle Management at Netflix

The Netflix team trains machine learning models and manages data effectively to enable data scientists to experiment quickly. To do this, they build and use Metaflow, an open source machine learning framework that is open source and agnostic. The advantage of using the Metaflow API is that it allows machine learning workloads to seamlessly interact with AWS cloud infrastructure services.

In addition, Netflix uses internal automated monitoring and alerting tools to report bad data quality to monitor and detect data movement. The "Runaway" tool implemented by Netflix also monitors for stale models in production and alerts the machine learning team. The model monitoring timeline generated by Runaway helps the team understand the cause of model issues and spot potential problems.

In addition, Runaway includes a repository for tracking model-related information, including artifacts and model lineage. According to Liping Peng, a senior software engineer on Netflix's personalization infrastructure team, Runaway gives machine learning teams the ability to search and visualize model structure and metadata - in other words, it gives them a user interface that makes it easy to understand the models they want to deploy to production or into production.

Purpose 2. Automate and streamline processes

With the increase in data generated per capita, the amount of data being collected has grown exponentially. However, when it comes to applying machine learning to solve real-world problems, we face a number of challenges. When machine learning engineers deliver their finished models, challenges such as differences in working environments and understanding packages and structures can make the deployment process new problems. Along the way, MLOps can automate and streamline the collaboration process.

Merck Research Labs implemented MLOps to accelerate vaccine research and discovery. They were working on multiple fronts to drive machine learning-driven innovation in pharma and healthcare, but were facing a number of challenges. During a long-running research project, the DevOps and machine learning teams were disconnected. This lack of collaboration led to skill mismatches and inefficiencies in the machine learning lifecycle, which increased both the time and cost of building models.

By deploying MLOps, Merck Research Labs was able to address this challenge. They implemented processing capabilities to automatically sift through composite images, and were able to streamline their machine learning operations with precise automation. They also addressed the issue of technology limitations by allowing researchers to freely use the iDE of their choice.

The primary purpose of MLOps is to automate processes based on CI/CD. Continuous Integration requires testing and validation of not only code and components, but also data, data schemas, and models. Continuous Delivery is predicated on deploying the entire machine learning training pipeline, not just a single software package. MLOps aims to implement all of these processes into a single pipeline system.

PadSquad, a mobile software company, wanted to improve the performance of the ads they serve to their global customers. They implemented an MLOps platform to reduce media costs for consumers. They were struggling with the efficiency of their AI development sessions, spending more time on operations than on core tasks.

The automated machine learning pipeline implemented by PadSquad supports the entire machine learning lifecycle and ensures advanced feature engineering capabilities. It was an abstraction of DevOps, and as a result, the MLOps implementation was crucial for them to streamline their AI application development process, from research to deployment.

In particular, it helped automate the operational aspects of machine learning development and deployment, allowing PadSquad's data scientists to focus on business logic. We were able to implement faster GTMs, and we saw improved ad performance with better experiences and engagement.

Purpose 3. Improve customer experience

Holiday Extras implementation architecture

A key use of Holiday Extra machine learning is to improve the Customer experience personalization, service pricing automation, and call handling automation. Here's how Holiday Extras machine learning deployed and scaled its model

  1. Code to develop a machine learning model is cookie cutter, templated, and pushed to the company's Github repository
    Modeling library uses Scikit-learn
    Data transformation uses custom transformers within the model code to configure the circuit learning pipeline.
  2. Model code stored in Github is replicated to Google Cloud Storage (GCS), and an AI platform is used to train the model
    Model files and metadata are returned to GCS
    After model evaluation, the AI platform provides a prediction service as an endpoint that can be called with the correct data schema.
  3. Interface client requests with a machine learning proxy
    Define the data schema expected by the AI platform so that other services can query the endpoint with the expected data schema.
  4. Data science teams collaborate to ensure approval processes for lifecycle management are handled
  5. Manage all training and performance details and versions so that you can track what you collect in Google AI Platform for governance.

Starbucks India boosted sales by applying a data-driven strategy. They used a unified data and analytics platform to collect and analyze loyalty data from various channels. Through marketing analytics in microsegments, Starbucks India identified customers based on their spending patterns, which allowed them to identify repeat purchasers.

With a data-driven loyalty strategy program, Starbucks india was able to drive higher returns from targeted campaigns, reduce customer churn, and explore inactivity triggers to help with proactive retention.

What should you think about before implementing an MLOps process?

To implement a successful MLOps, you need to think about why you need to do this, which means you need to plan your implementation based on your business goals.

Google has divided the stages into levels of MLOps maturity. The ultimate state of MLOps is at level 2, but this may not be the case for every organization. As mentioned above, level 0 can be valid if you're developing/deploying machine learning models for small projects or internal use. Level 1 is also a good fit for groups that are comfortable deploying models based on new data, rather than developing new machine learning models. As such, it's important to plan to optimize for the results you need, rather than just copying how someone else has implemented a solution.

Building MLOps to collect and store data can be inefficient. Prior to designing your platform, consider the You need to think about data quality first. You need to look at your data quality assessment process and investigate the point of change in your input data.

We also recommend starting with something simple for a pilot solution before a full-scale MLOps implementation. While it would be great to build an end-to-end pipeline as soon as possible, it can be effective to save time, effort, and money with managed services first. Since MLOps is fundamentally about infrastructure, working with an expert vendor to get started with managed services will not only get your team comfortable with dealing with the system, but also allow them to focus on improving productivity.

We've been analyzing MLOps deployments for a while now, and we've been working on a number of different cases. In addition to tools and technology planning, we've been actively communicating with organizations to help them implement cost-effective MLOps. If you're looking for centralized workflows and traceable processes, we recommend talking to an MLOps expert.


  1. MLOps is the application of DevOps to machine learning systems to optimize productivity and reliability, without separating development and operations.
  2. MLOps can be divided into three phases, and for LV 2, we aim to automate CI/CD to perform model training, deployment, and monitoring in a single pipeline.
  3. To build MLOps, rather than blindly replicating them, we recommend creating a pilot solution that is specifically designed for your business environment, or partnering with a specialized vendor to launch a managed service.


  2. MLOps: A Continuous Deployment and Automation Pipeline for Machine Learning | Cloud Architecture Center | Google Cloud
  3. Netflix Research: Machine Learning Platform
  4. Scaling Machine Learning at Holiday Extras

Talk to Expert