Introduction
Looking at the MLOps stacks from companies like Netflix, Twitter, etc., can be intimidating, and the thought ‘This isn’t for me’ often comes to mind. This blog is an attempt to understand MLOps from the perspective of a musical concert—not overly technical, but focused on gaining a fundamental understanding.
In the cover picture, we see an Orchestra concert. Now as a Data Scientist, we can play only a single instrument like a violin or a flute. In the beginning, our understanding of Data Science is limited to only how the violin would play nicely but if we see the bigger picture, every person, instrument in this, and many other stuff in the background has to perfectly work together with the Orchestra to give a beautiful experience. This is what MLOps does. It provides this seamless platform where anyone and any number of musicians can come and perform anytime.
Now, let’s try to create an analogy between building a music concert platform from scratch and MLOps.
Laying the Foundation
Suppose we are a musical concert organizer and we arrange concerts many times a month where any artist/musician or a group can come and perform. So, instead of always trying to build something from scratch we can build a platform where just the main performers change but everything else works seamlessly in harmony. Wouldn’t our lives as organizers be so good? That’s exactly what we are up to now.
We start our journey with ‘Provision and configuration management‘.
Orchestra – Here we set the infra. We will first create a provision from where we can get instruments separately like guitars, drums, pianos, etc. We will know whom to reach out to for what.
Data Science – Here we set up the configuration infrastructure. Ex: Ansible is a tool for provisioning and configuration management. It automates the setup and configuration of servers, making it easier to manage large-scale deployments.
So, now that our infra is sorted we need to make sure that these instruments work seamlessly and this is called ‘OS Virtualization/Orchestration’
Orchestra – We will ask the different artisans to work as teams. One guitar team works differently, and another drum team makes sure they have their configuration in place so that the instruments play well. Also, we will have a main conductor who will make sure all of them work seamlessly together too and if anything more or less quantity is needed then he can arrange it.
Data Science – Here we create containers for each service so they work seamlessly as a service and we will have one orchestrator who will make sure they work seamlessly together. Ex: Docker is a containerization tool that allows applications to run in isolated containers. Kubernetes is an orchestration tool that automates the deployment, scaling, and management of containerized applications.
Next, we need to have assistants who are mostly invisible and who take care of smooth communication between services.
Orchestra – We need assistants who make sure that wires are fitted well, there is no loose connection, all speakers are working fine and every other thing so that there are no communication issues across the team.
Data Science – We need to have seamless communication between services. Ex: Service Mesh is a dedicated infrastructure layer for handling service-to-service communication. Istio enhances microservice communication, providing features like traffic management and observability. k8 storage provides storage orchestration for containerized applications.
Now that we have a foundation with instruments, orchestrators, and assistants making sure everything is up and running we move to the next step where we write a script for how the Orchestra will be conducted.
Scripting the story
Now we move to the planning and management part. We will write a script as to how the Opera will be conducted and the performances(tasks) will be managed efficiently. This is the ‘Plan and Manage’ part.
Orchestra – Here the team sits together with a whiteboard and creates a storyboard as to how the performance will start, who will host, which order performers will come, and everything. Here everyone has complete visibility of the order of happening.
Data Science – Effective planning and management of the whole process. Tools like Jira help the team to plan, track, and manage work efficiently. Then for management and productivity, we have tools like Trello to enhance team productivity.
Crafting the tunes
We have the stage ready, the plan ready now we start creating the tunes that will be performed in the Orchestra.
Orchestra – Here we start creating the tunes for the orchestra. Every team starts their creations. Then collaborate and test their tunes.
Data Science – Here we enter the code and create phase. Developers start collaborating. Tools like Github are used for code collaboration. Cloud-based IDEs like VSCode are used to write code from anywhere. Build Management tools like Jenkins are used for automating the process of compiling the source code into executable code. Note -Python code doesn’t need to be changed to executable code. C++, Java, etc. need this build management step.
Rehearsing the Act
Now, as the tunes that are to be performed are ready, it is time for rehearsal before the grand concert.
Orchestra – In the rehearsal phase everyone goes through each scene and makes sure everything will be delivered flawlessly
Data Science – Here we use the term ‘Continuous Integration’. It involves regularly integrating code changes, verifying them, and packaging them for deployment. Travis CI is a CI/CD service that works like a diligent stage hand. Making sure everything is in place for deployment. Automated testing tools like JUnit ensure that any code changes don’t lead to errors.
Opening Night
Now, that we have done rehearsals and made sure the performance will be flawless, we get ready for the opening night.
Orchestra – There is a final check, every instrument is checked and tuned once, and then the curtain rises and the performance begins.
Data Science – Here we call it ‘Continuous Delivery’. The team introduced feature flags to toggle features on and off. Like having a switch to control. Code Inspection tools like SonarQube were used for the final dress rehearsal, ensuring every code was ready. Release Orchestration tools like Spinnaker are used to plan and coordinate the deployment of application releases.
Critic’s Review
The show is now live so it is time to monitor, manage, and address any issues.
Orchestra – It’s like a team in a control room watching the live performance and addressing any issues as they arise
Data Science – Site Reliability Engineering (SRE) outlines the best practices for running large-scale reliable services. Ensuring everything is reliable. Logging tools like Prometheus captured every performance and issue. Keeping a complete record. Alerting of any unexpected issues. Visualization tools like Grafana provides real-time performance metrics and debugging tools help to identify and fix errors.
Security
Most importantly, how can someone forget security? This ensures the whole orchestra is performed securely. They are on the lookout to ensure protection against any potential threats.
Conclusion
So, now this is what MLOPs bigger picture is. Again I will go back to the cover picture. Every person, instrument, backstage assistant, director, etc. has to work seamlessly for the Orchestra to be a lovely experience and bring everything together at the end.