Results for ""
Imagine you're interested in opening a sandwich cafe. You've spent months perfecting the bread, cheese, and combination of ingredients, and you're confident you've created the perfect sandwich. However, you soon realize that there are other crucial tasks that need to be addressed in order to successfully run a cafe, such as hiring staff, setting up tables, and maintaining hygiene.
This scenario is comparable to how many people approach machine learning (ML). They focus on perfecting the models by improving accuracy and reducing losses, but overlook the importance of delivering the models.
Chip Huyen, a former Stanford professor, succinctly captured this sentiment by stating,
"Many people when they hear ML system think of just ML algorithms."
According to aVentureBeat report, 87% of ML solutions fail to make it to production, and similar conclusions have been reached by Gartner. Surprisingly, the majority of concerns are not related to the ML models themselves.
Let's break down the ML solution into four stages and examine some of the concerns associated with each stage: pre-development, model development, model deployment, and post-deployment.
Pre-development
One of the most critical concerns is ensuring that the ML project is aligned with business value. Data scientists and ML engineers must prioritize this aspect before anything else. Projects that fail to add sufficient business value are unlikely to succeed. Profitability is at the core of any business and if the solution does not impact top-line or bottom-line growth, increase customer satisfaction or affect any other crucial business metric, the project may be abandoned. During this stage, the data team should collaborate closely with the business team to understand which problems are worth solving and which solutions will have the greatest impact. By combining technological expertise with business acumen, the team can work together to achieve success.
Another significant challenge is identifying and cleaning data. Data scientists and ML engineers must collaborate (and be encouraged to collaborate) with the relevant departments to acquire the necessary data, if available.
(Inspired from Andrej Karpathy)
Model Development
Lack of collaboration and communication between business and data scientists can lead to significant challenges in ML projects. Data scientists may focus on optimizing models for small improvements that do not have a significant impact on the business, such as spending weeks to improve model accuracy by a small percentage. Collaborating with the business/domain throughout development is crucial to ensure that the models meet expectations and that changes can be made if they do not. Business inputs, particularly in feature engineering, can be extremely valuable in model development. Additionally, some businesses may prioritize model interpretability as much as model accuracy, which should be considered during the development phase. Therefore, presence of silos between business and data scientists is a concern for ML projects.
Model Deployment
In the development phase, we often overlook the deployment ecosystem and neglect to answer critical questions such as how to handle data transfer when dealing with limited internet connectivity or remote locations, how to manage latency, or how to deploy the model when there are hardware limitations. These limitations may require compression techniques like pruning or quantization, optimizations may be performed to reduce inference time and in cases where there is no internet connectivity, the model might be pushed to the edge.
Post-Deployment
Deploying a model is not the end of the job. Just like software, ML system’s performance deteriorates over time. Therefore, post-deployment processes such as data distribution checks, monitoring inputs and outputs, updating models with current data, evaluating multiple models, testing on production etc., must be put in place.
As highlighted in the article, ML solutions are not just limited to models or ensembles of models but rather a comprehensive ecosystem that works together with businesses and other stakeholders. By considering this holistic perspective, developers can make informed decisions and increase the chances of successfully transitioning from pilot to production.
And this is why it takes a village to build ML Solutions.
https://twitter.com/lishali88/status/994723759981453312/photo/1 , https://www.datanami.com/2022/08/22/half-of-ai-models-never-make-it-to-production-gartner/ , https://venturebeat.com/ai/why-do-87-of-data-science-projects-never-make-it-into-production/