A Fortune 500 global hospitality and entertainment company achieves 90% operational automation of its systems with artificial intelligence.

SITUATION

Our client was struggling with the legacy Machine Learning (ML) Ops platform which provides DevOps services to ML models to construct, test, train, deploy and serve models using CI/CD pipeline automation. The platform was intended to be cloud agnostic, meaning this architecture should support any cloud. The MLOps platform uses Apache Airflow for scheduling workflows and leverages python API’s to dynamically manage a cluster. They had challenges with their existing vendor in terms of platform delivery, management and support. The platform has violated various enterprise security best practices and did not leverage Azure AD authentication and authorization due to improperly configured Apache Airflow instances on Azure VMs. The desired cost optimization was not accomplished.

Achieved 90% automation in deployments and delivery

SOLUTION

Our team completed the transition within a 3-month time frame and published a knowledge transition document. They also delivered a detailed technical debt tracking document on the current platform, created optimized terraform templates, externalized the majority of configuration, managed Terraform state configurations per environment and removed duplicate code. Our team developed CI/CD pipelines with Github Actions, implemented Terraform to Ansible integration and removed various manual steps in automation. There were various elements of sensitive information stored and distributed in Ansible configuration files which were moved to Azure Key Vault and our team created a seamless integration of configuration software using Ansible from terraform. The team successfully established developer onboarding documentation and processes and corrected various security and monitoring defects associated with high-risk design patterns and improved the current ML platform to an improved v2 platform.

RESULTS

Our solution improved operational usage and it automates 90% of deployments and delivery. This platform is certified by enterprise architecture and provides monitoring and governance. It also provides cost-optimized solutions in terms of clusters. Our developer onboarding documentation simplified the communication iterations. Ultimately, our team delivered a new Azure native cloud machine learning platform to serve robust machine learning models across enterprise.