We have the answers to your questions! - Don't miss our next open house about the data universe!

CI/CD for Kubeflow (part 2)
by Tony Ruiz, GCP Customer Engineer at Google

- Reading Time: 6 minutes

This is the second part of a series 


The previous article discusses the complexities of the machine learning life cycle and how Vertex Pipelines, a managed version of Kubeflow on Google Cloud, can address operationalizing a machine learning model. The next step in the journey is to automate the training and deployment of the system in a manner that facilitates and enables reproducibility and time to market. In traditional software development, this capabilities would be enabled by a system that allows for 

  • Continuous Integration (CI) – the ability to test and validate checked in code that needs to be merged into a central repository
  • Continuous Deployment (CD) – Creating and deploying the system to an appropriate environment

Since machine learning models are dependent on the data, it must be accounted for in the CI/CD process. Particularly,

  • Data must be tested and validated for quality and training/serving skew
  • A model must be continuously trained on new data to account for changes in the world
  • The model must be validated, tested and deployed to a prediction service

This also introduces the notion of continuous training. Depending on the use case, we need our model to be trained and updated on a frequency.


The goal of this article is to serve as a gentle introduction to CI/CD for the Vertex Pipelines created in the previous article. To address all the components in a comprehensive CI/CD workflow in the image above would go beyond the scope of a short blog. This article assumes the reader has foundational knowledge on concepts surrounding git, docker, and cloud technologies.

What is Cloud Build ?

Cloud build is a fully managed, serverless CI/CD platform. Other potential include Jenkins, Azure DevOps and Github Actions. A CI/CD platform allows for teams to effectively and iteratively develop and ship software to different environments. This tutorial will use Google Cloud Build alongside a github repository to build and run containers using the code that was pushed to a main branch.

A Minimalistic CI/CD/CT Workflow


Recall from the previous post, that the goal was to build an end-to-end ML workflow where 

  • A model is trained 
  • Evaluated against a metric 
  • Deployed to a computer endpoint where predictions can be made

Note that in some instances, the deployment workflow may be separate from the training workflow. For example, you may want your machine learning model to be trained on a weekly basis but go through an extensive quality assurance process before deploying.

Rewrite our pipeline workflow

The first thing we will need to do is transform the python notebook file (.ipynb) into a normal python file (.py).  This python file contains all the executable code to recreate a compiled pipeline file that will actually be used to execute our machine learning workflow. This is because python files work much better with text files than with mixtures of text and binaries, which is what notebook files are. Refer to the github repository here for the full code.

Create a Dockerfile

The next step is to create a Dockerfile that contains our python file pipeline.py and our requirements.txt. Note that we are creating a separate folder (./kpi-cli) within our workspace to house the files for our docker image.

					%%writefile kfp-cli/Dockerfile

FROM gcr.io/deeplearning-platform-release/base-cpu
WORKDIR /kfp-cli
ADD pipeline.py ./pipeline.py
ADD requirements.txt ./requirements.txt
RUN pip install -r requirements.txt
CMD python ./pipeline.py

Build a local image and push it to Container Registry

Now that a Dockerfile is defined and the dependencies exist within a directory, create a local image and push it to Artifact Registry.

IMAGE_URI='gcr.io/{}/{}:{}'.format(PROJECT_ID, IMAGE_NAME, TAG)

!gcloud builds submit --timeout 15m --tag {IMAGE_URI} ./kfp-cli

The cloudbuild product page within the console will contain the information on the build.

Build a cloudbuild.yaml file

The next step is to create the cloudbuild.yaml file. This yaml file will contain the steps to execute the build when code gets pushed to the remote repository. For more information on creating the cloudbuild, refer to the documentation

In short, this will do a few things

  • Build the docker image using the code in the repository. If there are errors in code, this build step will fail
  • Push the most recent version of the docker image back to Artifact Registry
  • Use the image that was created to run the python script in the repository. The output of this execution will be the pipeline yaml file 
  • The next step is to write the pipeline yaml file to a google cloud storage bucket. This yaml file can then be picked up by a scheduler to trigger the pipeline.
  # Step 1: Build the Docker image
  - name: 'gcr.io/cloud-builders/docker'
    args: ['build', '-t', 'gcr.io/$_PROJECT_ID/kfp-truiz-mlops-v2:latest', './kfp-cli']
  # Step 2: Push the image back to container registry 
  - name: 'gcr.io/cloud-builders/docker'
    args: ['push', 'gcr.io/$_PROJECT_ID/kfp-truiz-mlops-v2:latest']

  # Step 3: Run the Python file within the container
  - name: 'gcr.io/$_PROJECT_ID/kfp-truiz-mlops-v2:latest'
    args: ['python', './kfp-cli/pipeline.py']  
  # Step 4: Write results to Google Cloud Storage
  - name: 'gcr.io/cloud-builders/gsutil'
    args: ['cp', './xgb-pipeline.yaml', 'gs://$_PROJECT_ID-cloudbuild-pipelines'] 

   _REGION: 'us-central1'

Submit a local job to the Cloud Build

Once the docker file is defined and the cloudbuild yaml created, a local build can be submitted using gcloud command line interface. This:

					!gcloud builds submit ./kfp-cli --config cloudbuild.yaml

Connect Google Cloud to our Github Repository

Now that the cloudbuild file is defined, there needs to be an integration between the cloudbuild environment and the git repository. Fortunately this is pretty simple and straightforward to set up via the console

The first step is to create a host connection to the git environment. This will in turn handle the authentication mechanism to the repository. You will be redirected to your github repository to allow the connection. An authentication token from GitHub will be created and stored in this project as a Secret Manager secret. You may revoke access to Cloud Build through GitHub at any time. 

The second step is to link the repository. You will need to install the Google Cloud build application within Github to allow the integration to the selected repository.

After that step is complete, you will be able to link the repository to your Google Cloud Build host connection.

Create a Cloud Build Trigger


Now every time a push is made to the main branch, a build process will kick off. This is done through Cloud build triggers, which allow the user to trigger specific actions on connected repositories. In the image below, a trigger is created to execute a build every time there’s a push to a new branch. Note that specific branches can be defined using regex. The cloudbuild.yaml file is also specified as the configuration for the build.

Once the connection to the remote repository is established and a trigger is created, the next step is to push a commit to the remote repository. This will execute a build on the environment within the remote git repository. The changes that were pushed to the folder will be used to create a new image, push it to the container registry, run the pipeline file, and move the compiled pipeline specification to storage. All that is needed now is to execute the pipeline using a scheduler or an event.

Configure a schedule to execute Pipeline

The next step is to actually execute the pipeline file. How this is done can vary depending on the use case. 

  • We may want to execute a scheduled training pipeline weekly, and deploy our model only when a model artifact has made it up the proper approval channels.
  • We may want to retrain a model after a batch process is complete. For example, when data in our data warehouse gets updated, a trigger would occur that trains the model.We can schedule a recurring schedule with the Scheduler API or we can create an event driven workflow using a messaging/queue service like Pub/Sub. For our use case, a weekly batch process will suffice.
					pipeline_job = aiplatform.PipelineJob(

pipeline_job_schedule = pipeline_job.create_schedule(
    cron="30 18 * * 6",  # weekly 6pm UTC time on Saturday



Congratulations, this article covered a lot of ground. We went from training a machine learning model through a manual process to automating the training, deployment and integrating our workflow with a code repository. This article did not cover an extensive list of all that can be done in a CICD workflow. For example, a more mature pipeline would handle things like unit tests, deployment to different environments, testing the data for training/serving skew, and so forth, however, it did demonstrate a starting point to work towards those capabilities.

You are not available?

Leave us your e-mail, so that we can send you your new articles when they are published!
icon newsletter


Get monthly insider insights from experts directly in your mailbox