GitLab is a code hosting and version management service that doubles as a complete DevOps platform. Find out everything you need to know about it: how it works, how it differs from GitHub, use cases for Data Science and Machine Learning, training courses…
In the fields of Data Science and Machine Learning, and more generally in software development, code hosting and version management services have become indispensable. Among the most widely used platforms are GitHub and GitLab.
Both are web-based Git repositories. The Git version management system enables you to manage software development projects and all related files as they change.
In this way, changes made by individual team members can be managed and supervised. Project members can coordinate their work and track progress over time.
Information is stored as data in a repository. It contains objects and their references, and acts as a centralized location where developers can store, share, test and collaborate on development projects.
What is GitLab?
Like GitHub, GitLab is a Git repository manager enabling teams to collaborate on computer code. It is written in Ruby and Go, and was created in 2011 by Dmitriy Zaporozhets and Valery Sizov.
It is an entirely open source platform. It’s also free for private users.
Multiple members of a team can use GitLab to collaborate on the same project, propose changes, and possibly backtrack in the event of unforeseen problems.
Since the launch of version 10.0, GitLab has become more than just a Git repository.
The service now offers a “Complete DevOps” vision, unifying development and operations in a single user experience.
This new version offers better integration between development and DevOps tools. Users can perform all project tasks, from planning and source code management to monitoring and security.
What are its components?
GitLab is based on several components, forming a complete solution for DevOps and project management. Firstly, “projects” can be created to host code, collaborate on it, or identify problems.
Native continuous integration and continuous delivery functionalities (Gitlab CI/CD) enable applications to be developed, tested and deployed on an ongoing basis. Projects can be made public, or reserved for an internal or private audience.
It is possible to assemble several related projects into a “group”. In addition, “SubGroups” can be used to create a hierarchy of up to 20 group levels.
GitLab’s native Continuous Integration (CI) features enable small pieces of code to be added to a Git-hosted application. For each “push”, a pipeline of scripts can be run to test the code before validating the changes and bringing them into the project.
Continuous Delivery and Deployment (CD) enables the application to be put into production with each push. GitLab’s CI/CD is configured by a file called .gitlab-ci.yml placed at the root of the Git repository, and the scripts in this file are executed by the GitLab Runner.
GitLab vs GitHub: what are the differences?
There are several major differences between GitLab and GitHub. These differences concern, for example, the authentication and access permissions systems, which are more granular on GitLab and therefore better suited to large teams working on large-scale projects.
GitLab also stands out for its Continuous Integration and Delivery system. This saves development teams precious time. For users already using external Continuous Integration, the platform is compatible with Jenkins, Codeship and many others.
The Auto DevOps system also enables Continuous Integration or Continuous Delivery to be launched automatically, without human intervention.
This gives GitLab a head start over GitHub in the field of DevOps.
However, at the end of 2019, GitHub launched “Actions”. This new system enables tasks to be written to automate and customize the development workflow. On the other hand, GitHub does not offer a deployment platform. A third-party application such as Heroku is required.
The final difference concerns the price of the “enterprise” versions of these two services. GitHub’s enterprise package starts at $250 per user per year, while GitLab starts at $39 per user per year.
In short, GitHub is by far the most popular Git repository, with several tens of millions of users, compared with just 100,00 for GitLab. Nevertheless, GitLab supports teams throughout the DevOps process and is more affordable for enterprises.
GitLab for Data Science and Machine Learning
Data Science and Machine Learning teams can bring a valuable advantage to a business, unlocking actionable insights from datasets.
However, to achieve this, these teams have significant needs in terms of collaboration, project planning and management, and version management of files, models or datasets.
Data Science and Machine Learning professionals also need to be able to automate crucial workflow steps to gain efficiency and avoid manual errors. They also need to streamline the testing and validation processes of their work for greater speed and repeatability.
Finally, infrastructure management needs to be simplified as much as possible (especially when this infrastructure relies on multiple Cloud providers). GitLab meets these needs, which is why this tool is a must-have for Data Science and Machine Learning.
Teams can easily collaborate across departments, manage and schedule their work, keep track of changes made as models are developed, trained and deployed.
Automation is made possible by GitLab’s CI/CD, enabling models to be easily validated by testing various elements with each change. Building and deploying a model is also automated.
Finally, a model can be deployed and managed on any cloud.
Recently, Iterative.ai launched a new open source project called CML (continuous machine learning). This project adapts GitLab CI to Data Science and Machine Learning use cases.
Examples include automatic model training, automatic testing and reporting with data visualization.
How do you learn to use it?
To learn how to use GitLab, turn to DataScientest training courses. Our hybrid training courses are based on an innovative “Blended Learning” approach (hybrid face-to-face / distance learning) and enable you to acquire a diploma certified by the Sorbonne University.
GitLab is included in our Data Engineer training program. This course enables you to learn about the Data Engineer profession, and the different tools and techniques used in this profession.
If you’re already a Data Scientist and want to learn how to put Machine Learning models into production, you can opt for our Machine Learning Engineer course. Gitlab is one of the tools included in the “test and deploy” module of this course.