How often have you heard “The Machine Learning Application worked well in the lab, but it failed in the field. “? It is not the fault of the Machine Learning Model!
This blog is not yet another blog article (YABA) on DataOps, DevOps, MLOps, or CloudOps.
I do not mean to imply xOps is not essential.
For example, MLOps is both strategic and tactical. It promises to transform the “ad-hoc” delivery of Machine Learning applications into software engineering best practices.
We know the symptoms: Most machine-learning models trained in the lab perform poorly on real-world data [1, 2, 3, 4].
…
Equivalent mappings of seventeen cloud services of the three top market share cloud vendors: Azure, AWS, and GCP, are described and compared. The exception is the Machine Learning services, where Google has many more complete Machine Learning SaaS (Software as a Service) offerings [1].
Cloud vendors continually add services. Diagrams is a work in- progress, as all services are not added yet (10/21/2020).
I do not discuss the category of security — a significant category for the cloud that is still evolving and needs a blog for itself.
I discuss Identity Management, which is secured by the authorized account.
Multiple…
Estimates state that 70%–85% of the world’s data is text (unstructured data) [1]. New deep learning language models (transformers) have caused explosive growth in industry applications [5,6.11].
This blog is not an article introducing you to Natural Language Processing. Instead, it assumes you are familiar with noise reduction and normalization of text. It covers text preprocessing up to producing tokens and lemmas from the text.
We stop at feeding the sequence of tokens into a Natural Language model.
The feeding of that sequence of tokens into a Natural Language model to accomplish a specific model task is not covered here.
…
It is code review time. Some of you would rather avoid the code review process. Whether you are new to programming or an experienced programmer, the code review is a shared learning experience for all involved. Rather than talk about “code review process best practices,” I share with you coding techniques I use to change code review from WTFs (What’s That For?) into WOWs (Wonderful! Oh! Wow!).
The anticipation of a code review process causes us to raise our game because we open-up our code for other programmers to see (criticize). It may look, feel, and bark like criticism. And…
The rendering of high-quality architecture diagrams of Azure, AWS, and GCP is shown using the Python package Diagrams. Diagrams depend on the Graphviz runtime. This article shows step-by-step how to create a Docker image with Diagrams and Graphviz. All code is included and can be downloaded.
I have posted several articles on how to create development and test Docker images [see references 4, 5, and 6 below]. I assume you know of Docker and have read them.
Docker is used for encapsulating an individual image of your application.
Docker-Compose is used to manage several images at the same time for…
Estimates vary, but machine learning engineers spend between 5–15% of their working time on the machine learning engine (MLLabOps). The other 85–95% is spent on getting and munging data for input into the machine, pre-processing, which is the domain of DataOps and creating and maintaining a stable version of the entire Machine Learning Application (MLA) in production, which is the domain of MLProdOps.
Usually, DevOps labor time is not in the accounting. The development, rollout, and maintenance of an MLA code probably increase non-MLLabOps to more than 90% of the labor time. …
“DevOps is a set of practices that combines software development (Dev) and IT operations (Ops). It aims to shorten the systems development life cycle and provide continuous delivery with high software quality. DevOps is complementary with Agile software development; several DevOps aspects came from the Agile methodology.” — Wikipedia
The definition above misses the mark for me.
Use Streamlit as your Web application base when security is not needed. If you need security from your Web application, use Flask, FastAPI, or Django packages.
I am refactoring Flask-based applications into Streamlit-based applications. Streamlit for Machine Learning applications has proven better in maintenance expenses as they lower our technical debt by approximately 80%.
As we continue refactoring Flask-based applications into Streamlit-based applications, technical debt may be reduced by 95%.
Admittedly, by replacing working Flask applications, we violate the “not broke, do not fix“ catechism.
I consider the use of Streamlit to “replace what is not broke” to be both…
I and preform benchmarking. Our journey begins by installing GoLang and setting up a GoLang Development Environment. I show the benchmarking of the GoLang kmeans implementation and the Python sklearn kmeans.
I have a problem, which I think I share with a good part of the Machine Learning (ML) community.
I need a way to speed up my Python Machine Learning solutions to put them in production.
Python is too slow for production Machine Learning applications. I need to switch away from Python.
What I decided to do was: Learn GoLang.
It is almost as fast as C. It is…
We accomplish most of our Python software development in a local machine’s environment: the software developer’s sandbox. I discuss tools that dev and test adopted for the readability, testing, profiling, logging, quality, security, and version control of code before it’s pushed out to be shared on dev, test, and stage servers.
We are continually evaluating software tools for every stage in our Python development lifecycle.
I try to stay away from using the term DevOps and the latest marketing terms DataOps, GitOps, CloudOps, and MLOps.
We used Python predominately (90%) over the last seven years because: