Physicist, Machine Learning Scientist and constantly improving Software Engineer. I extrapolate the future from emerging technologies.

How often have you heard “The Machine Learning Application worked well in the lab, but it failed in the field. “? It is not the fault of the Machine Learning Model!

Image for post
Image for post

Warning!

This blog is not yet another blog article (YABA) on DataOps, DevOps, MLOps, or CloudOps.

I do not mean to imply xOps is not essential.

For example, MLOps is both strategic and tactical. It promises to transform the “ad-hoc” delivery of Machine Learning applications into software engineering best practices.

What are the Symptoms of the Problems of Deploying Machine Learning Applications?

We know the symptoms: Most machine-learning models trained in the lab perform poorly on real-world data [1, 2, 3, 4].

What is the critical Problem with Machine Learning Success?

Machine Learning created profits in the year 2020 and will continue to increase profits in the future. …


Equivalent mappings of seventeen cloud services of the three top market share cloud vendors: Azure, AWS, and GCP, are described and compared. The exception is the Machine Learning services, where Google has many more complete Machine Learning SaaS (Software as a Service) offerings [1].

Image for post
Image for post

Missing Services

Cloud vendors continually add services. Diagrams is a work in- progress, as all services are not added yet (10/21/2020).

1. Cloud Identity and Access Management (IAM)

I do not discuss the category of security — a significant category for the cloud that is still evolving and needs a blog for itself.

I discuss Identity Management, which is secured by the authorized account.

Multiple accounts have been around since one of the first multi-process operating systems (MULTICS in the 1960s). …


We show Python code and benchmarks for 27 different NLP text pre-processing actions.

Image for post
Image for post

Outline

Estimates state that 70%–85% of the world’s data is text (unstructured data) [1]. New deep learning language models (transformers) have caused explosive growth in industry applications [5,6.11].

This blog is not an article introducing you to Natural Language Processing. Instead, it assumes you are familiar with noise reduction and normalization of text. It covers text preprocessing up to producing tokens and lemmas from the text.

We stop at feeding the sequence of tokens into a Natural Language model.

The feeding of that sequence of tokens into a Natural Language model to accomplish a specific model task is not covered here.

In production-grade Natural Language Processing (NLP), what is covered in this blog is that fast text pre-processing (noise cleaning and normalization) is critical. …


It is code review time. Some of you would rather avoid the code review process. Whether you are new to programming or an experienced programmer, the code review is a shared learning experience for all involved. Rather than talk about “code review process best practices,” I share with you coding techniques I use to change code review from WTFs (What’s That For?) into WOWs (Wonderful! Oh! Wow!).

Image for post
Image for post

My Approach to the Code Review Process

The anticipation of a code review process causes us to raise our game because we open-up our code for other programmers to see (criticize). It may look, feel, and bark like criticism. And just maybe it is. But like a bar fight, it is a chance for you to grow and bond with your team-mates. …


Visualize your architecture

clouds seen from above
clouds seen from above

The rendering of high-quality architecture diagrams of Azure, AWS, and GCP is shown using the Python package Diagrams. Diagrams depend on the Graphviz runtime. This article shows step-by-step how to create a Docker image with Diagrams and Graphviz. All code is included and can be downloaded.

Docker Solution for Graphviz, Diagram, and Cluster

I have posted several articles on how to create development and test Docker images [see references 4, 5, and 6 below]. I assume you know of Docker and have read them.

Docker is used for encapsulating an individual image of your application.

Docker-Compose is used to manage several images at the same time for the same application. This tool offers the same features as Docker but allows you to have more complex applications. …


I share the Colab (and Jupyter) notebook Python code utilities used by our team.

Image for post
Image for post

Outline

If you don’t have a Google account, create one.

If you do not have a Colab account, create a Colab account by logging in with your Google account.

Use the same Google account for your Google Drive.

These are some of the Colab (and Jupyter) notebook Python code snippets used by our team.

  1. Reload any .py file that changed;
  2. Install most Python packages;
  3. Show matplotlib based graph inline of Colab;
  4. Show the number of installed packages and list all installed packages;
  5. Show base computing image properties.;
  6. Show all computing devices.; …


As we continue to develop machine learning Operations (MLOps), we need to think of machine learning (ML) development and deployment flow as other than a pipeline.

Image for post
Image for post

What is the definition of a pipeline?

The concept of a computing pipeline was around before the mainstream adoption of Machine Learning.

Software pipelines, which consist of a sequence of computing processes (commands, program runs, tasks, threads, procedures, etc.), conceptually executed in parallel, with the output stream of one process automatically fed as the input stream of the next one. The Unix system called pipe is a classic example of this concept. — https://en.wikipedia.org/wiki/Pipeline_(computing)

Where did you first learn about pipelines in the context of Machine Learning? …


We show Python code and benchmarks for ten different spaCy text preprocessing actions.

Image for post
Image for post

Introduction

Estimates state that 70%–85% of the world’s data is text (unstructured data). Additionally, new deep learning language models (transformers) have caused explosive growth in industrial applications.

This blog is not a blog article introducing Natural Language Processing (NLP). The feeding of a sequence of tokens, created from the raw text, into different Natural Language models is not covered here. Instead, we focus on preprocessing text before it is input as tokens into a Natural Language model.

Raw text degrades the NLP modeling unless the noise removal operation deletes or transforms words in the text to the sequence of tokens. Noise removal is usually NLP model dependent. …


These are tools, packages, and libraries that my colleagues and I use to increase Machine Learning pipeline development and production deployment productivity. What follows is a snapshot of our favorites as of December 24, 2020.

Image for post
Image for post

Python

We used Python predominately (95%) over the last seven years because:

  1. Almost all new Machine Learning models, cloud, GPUs, and many other are available as a Python API;
  2. The assortment and number of free code and packages is the largest we have seen;
  3. Native Python is slower than C by 20+ times, but almost all Python packages are near C speed as they are thin APIs over CPython or use some other speedup technique.

We used C to speedup Python when Numba could not be used. We tried Go, but it did not work out.

4. Python GIL (lack of concurrency on multicore machines) is bypassed more and more each day by the cloud, Spark, package implementation (i.e.,XGBoost), and strong typing with the introduction of type hinting starting in Python 3.5. …


One of those pathways can cause the next pandemic.

Image for post
Image for post

The Coronavirus Has Been Infecting Humans for 1,000s of Years.

The Coronavirus has existed for a millennium alongside humans, infecting and passing between them. Coronaviruses also has frequently crossed species barriers, and some have emerged as important human pathogens. Those virus variants died off, becoming extinct because they killed the host.

Ancient deadly Coronavirus died off because the virus infection killed the host, and only immune hosts survived. We can assume that COVID-19 is a relatively new mutation of Coronavirus.

Most modern human Coronavirus originated from bats where they are non-pathogenic (to humans or bats), resulting in symptoms as severe as a cold.

So how did the deadly COVID-19 virus mutate from the Coronavirus?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store