Sign in

Physicist, Machine Learning Scientist and constantly improving Software Engineer. I extrapolate the future from emerging technologies.

I offer a checklist for your Machine Learning Operation (MLOps) endeavor.

As part of the AI 1.0 surge (1983–1987), I felt that AI, to be of practical use, had to be distributed. Since then, I have been building distributed operating systems.

I admit I stopped when I encountered Kubernetes in 2016 because from a Software Architect and Engineer viewpoint, there was little I could add, and what I could add would be minor.

Yes, I think Kubernetes is a great distributed operating system architecture for a cloud of virtual machines (VMs) to a hive of Rasberry Pis.

I returned to Machine Learning about nine years ago during the AI 2.0 …


MLOps = DevOps + DataOps+ MLLabOps + MLProdOps + DSOps

I was taught to transform a complex problem into a simpler problem, by dividing the problem into smaller sub-problems.

I made the process paradigm of Machine Learning Operations (MLOps) simpler by dividing it into five different, but overlapping process groups or Operations (Ops) groups.

What Does MLOps mean to me?

MLOps is automating the Machine Learning product life cycle.

Why Do MLOps exist?

In 1990, we had 80%+ IT projects that were never rolled out. Failure rate dropped because of standardized developer tools, repeatable process iteration, death of the Waterfall method, a rise of the Agile method, and unit testing — to list some of the code development advances.

In…


Let’s expand the definition of refactoring to include new functionalities

coffee cup beside a keyboard and computer screen
coffee cup beside a keyboard and computer screen

Different Kind of Refactoring

I plan to increase stability and performance and to decrease the cost of maintenance in the Photonai code base. I add clustering functionality to the original code base and change the architecture.

A small code refactoring task is usually fixing bugs. Some consider that it is not refactoring if the bug-fixing occurs before releasing to test.

A significant code refactoring project example is causing a program to be Y2K-compliant but not changing functionality. Y2K compliance is enabling code to operate correctly with dates at or beyond January 1, 2000. (Yes, this was a thing!)

“Refactoring is a disciplined technique for…


Let’s discuss why type hinting techniques and tools improve your Python code

old-fashioned type in a case
old-fashioned type in a case

Overview of Type Hints or Type Annotations

Python is a dynamically typed language. However, starting with Python 3.5 (PEP 484), type hints were introduced. Type hints (note: not strong type checking) make it possible, post coding of Python, to do static type checking of code.

Here’s a great figure showing the evolution of Python type hinting:


Visualize your architecture

clouds seen from above
clouds seen from above

The rendering of high-quality architecture diagrams of Azure, AWS, and GCP is shown using the Python package Diagrams. Diagrams depend on the Graphviz runtime. This article shows step-by-step how to create a Docker image with Diagrams and Graphviz. All code is included and can be downloaded.

Docker Solution for Graphviz, Diagram, and Cluster

I have posted several articles on how to create development and test Docker images [see references 4, 5, and 6 below]. I assume you know of Docker and have read them.

Docker is used for encapsulating an individual image of your application.

Docker-Compose is used to manage several images at the same time for…


I introduce KILT, a benchmark framework for natural language models. I also show how to retrieve close to one million public text or PDF documents. Some of these documents are raw text, some are clean text, and some include categorical labeling.

List of Lists of Public NLP Datasets.

The following are non-inclusive lists of lists of NLP datasets:

Raw text

  1. Awesome-Public-Datasets;
  2. Project Gutenberg: File Repository;
  3. Project Gutenberg: Top 100 EBooks as of 8/15/2020;
  4. Google Books API for Python;
  5. Google Books Ngram Viewer;
  6. Google datasets;
  7. textacy datasets;
  8. Kaggle datasets;
  9. fast.ai datasets;
  10. USC Machine Learning Repository datasets;
  11. pyquora: A Python module to fetch and parse data from Quora;
  12. Zillow: Real Estate…


Google Colab can access any public Jupyter Notebook from GitHub or Drive

Why Use Google Colab?

You can use the Jupyter Notebook on your local computer. Google Colab improves on the Jupyter Notebook in many ways. Here are the seven most powerful reasons to use Google Colab:

  1. You can get any public Jupyter Notebook from a GitHub repository.
  2. You load, edit, and save any .ipynbfile to the Google Drive associated with the Colab login. It is helpful to have a separate Google account for each project and thus a different Google Drive.

Note: You can create a Git account for any project folder on Google Drive. Each team member hosts on a variety of different local…


I discuss the sparsely documented bridging concept that causes Minikube to behave like Kubernetes.

I avoid going deeply into why and how to use or Kubernetes or Minikube. Instead, I focus on how Minikube’s Profile enables training on your local system for Kubernetes on the cloud.

What is Kubernetes?

When Kubernetes was conceptualized, it was started as a Cloud Distributed Operating System (CDOS) for microservices. A “pure” cloud of microservices where complex applications are composed of small independent processes which communicate with each other through APIs (Application Programming Interfaces) over a network.

After all, a cloud is a myriad of heterogeneous hardware, each with its operating systems (bare-metal), that hosts your microservices, which you want to replicate…


Sometimes yes, sometimes no.

I and perform benchmarking. Our journey begins by installing GoLang and setting up a GoLang Development Environment. I show the benchmarking of the GoLang kmeans implementation and the Python sklearn kmeans.

Introduction

I have a problem, which I think I share with a good part of the Machine Learning (ML) community.

I need a way to speed up my Python Machine Learning solutions to put them in production.

Python is too slow for production Machine Learning applications. I need to switch away from Python.

What I decided to do was: Learn GoLang.

It is almost as fast as C. It is…


Evaluating the DevOps tools I’ve used for rolling out machine learning applications

Estimates vary, but machine learning engineers spend between 5–15% of their working time on the machine learning engine. The other 85–95% is spent on getting and munging data for input into the machine, pre-processing, the domain of DataOps, and creating and maintaining a stable version of the entire Machine Learning Application (MLA) in production, which is the domain of MLProdOps.

Usually, DevOps labor time is not in the accounting. The development, rollout, and maintenance of an MLA code probably increase non-MLLabOps to more than 90% of the labor time. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store