July 2018

Volume 33 Number 7

[Machine Learning]

Machine Learning with IoT Devices on the Edge

By James McCaffrey

Imagine that,in the not too distant future, you’re the designer of a smart traffic intersection. Your smart intersection has four video cameras connected to an Internet of things (IoT) device with a small CPU, similar to a Raspberry Pi. The cameras send video frames to the IoT device, where they’re analyzed using a machine learning (ML) image-recognition model and control instructions are then sent to the traffic signals. One of the small IoT devices is connected to Azure Cloud Services, where information is logged and analyzed offline.

This is an example of ML on an IoT device on the edge. I use the term edge device to mean anything connected to the cloud, where cloud refers to something like Microsoft Azure or a company’s remote servers. In this article, I’ll explain two ways you can design ML on the edge. Specifically, I’ll describe how to write a custom model and IO function for a device, and how to use the Microsoft Embedded Learning Library (ELL) set of tools to deploy an optimized ML model to a device on the edge. The custom IO approach is currently, as I write this article, the most common way to deploy an ML model to an IoT device. The ELL approach is forward-looking.

Even if you’re not working with ML on IoT devices, there are at least three reasons why you might want to read this article. First, the design principles involved generalize to other software development scenarios. Second, it’s quite possible that you’ll be working with ML and IoT devices relatively soon. Third, you may just find the techniques described here interesting in their own right.

Why does ML need to be on the IoT edge? Why not just do all processing in the cloud? IoT devices on the edge can be very inexpensive, but they often have limited memory, limited processing capability and a limited power supply. In many scenarios, trying to perform ML processing in the cloud has several drawbacks.

Latency is often a big problem. In the smart traffic intersection example, a delay of more than a fraction of a second could have disastrous consequences. Additional problems with trying to perform ML in the cloud include reliability (a dropped network connection is typically impossible to predict and difficult to deal with), network availability (for example, a ship at sea may have connectivity only when a satellite is overhead) and privacy/security (when, for example, you’re monitoring a patient in a hospital.)

This article doesn’t assume you have any particular background or skill set but does assume you have some general software development experience. The demo programs described in this article (a Python program that uses the CNTK library to create an ML model, a C program that simulates IoT code and a Python program that uses an ELL model) are too long to present here, but they’re available in the accompanying file download.

What Is a Machine Learning Model?

In order to understand the issues with deploying an ML model to an IoT device on the edge, you must understand exactly what an ML model is. Very loosely speaking, an ML model is all the information needed to accept input data, make a prediction and generate output data. Rather than try to explain in the abstract, I’ll illustrate the ideas using a concrete example.

Take a look at the screenshot in Figure 1 and the diagram in Figure 2. The two figures show a neural network with four input nodes, five hidden layer processing nodes and three output layer nodes. The input values are (6,1, 3.1, 5.1, 1.1) and the output values are (0.0321, 0.6458, 0.3221). Figure 1 shows how the model was developed and trained. I used Visual Studio Code, but there are many alternatives.

Creating and Training a Neural Network Model
Figure 1 Creating and Training a Neural Network Model

The Neural Network Input-Output Mechanism
Figure 2 The Neural Network Input-Output Mechanism

This particular example involves predicting the species of an iris flower using input values that represent sepal (a leaf-like structure) length and width and petal length and width. There are three possible species of flower: setosa, versicolor, virginica. The output values can be interpreted as probabilities (note that they sum to 1.0) so, because the second value, 0.6458, is largest, the model’s prediction is the second species, versicolor.

In Figure 2, each line connecting a pair of nodes represents a weight. A weight is just a numeric constant. If nodes are zero-base indexed, from top to bottom, the weight from input[0] to hidden[0] is 0.2680 and the weight from hidden[4] to output[0] is 0.9381.

Each hidden and output node has a small arrow pointing into the node. These are called biases. The bias for hidden[0] is 0.1164 and the bias for output[0] is -0.0466.

You can think of a neural network as a complicated math function because it just accepts numeric input and produces numeric output. An ML model on an IoT device needs to know how to compute output. For the neural network in Figure 2, the first step is to compute the values of the hidden nodes. The value of each hidden node is the hyperbolic tangent (tanh) function applied to the sum of the products of inputs and associated weights, plus the bias. For hidden[0] the calculation is:

hidden[0] = tanh((6.1 * 0.2680) + (3.1 * 0.3954) +
                 (5.1 * -0.5503) + (1.1 * -0.3220) + 0.1164)
          = tanh(-0.1838)
          = -0.1817

Hidden nodes [1] through [4] are calculated similarly. The tanh function is called the hidden layer activation function. There are other activation functions that can be used, such as logistic sigmoid and rectified linear unit, which would give different hidden node values.

After the hidden node values have been computed, the next step is to compute preliminary output node values. A preliminary output node value is just the sum of products of hidden nodes and associated hidden-to-output weights, plus the bias. In other words, the same calculation as used for hidden nodes, but without the activation function. For the preliminary value of output[0] the calculation is:

o_pre[0] = (-0.1817 * 0.7552) + (-0.0824 * -0.7297) +
           (-0.1190 * -0.6733) + (-0.9287 * 0.9367) +
           (-0.9081 * 0.9381) + (-0.0466)
         = -1.7654

The values for output nodes [1] and [2] are calculated in the same way. After the preliminary values of the output nodes have been computed, the final output node values can be converted to probabilities using the softmax activation function. The softmax function is best explained by example. The calculations for the final output values are:

sum = exp(o_pre[0]) + exp(o_pre[1]) + exp(o_pre[2])
    = 0.1711 + 3.4391 + 1.7153
    = 5.3255
output[0] = exp(o_pre[0]) / sum
          = 0.1711 / 5.3255 = 0.0321
output[1] = exp(o_pre[1]) / sum
          = 3.4391 / 5.3255 = 0.6458
output[2] = exp(o_pre[2]) / sum
          = 1.7153 / 5.3255 = 0.3221

As with the hidden nodes, there are alternative output node activation functions, such as the identity function.

To summarize, an ML model is all the information needed to accept input data and generate an output prediction. In the case of a neural network, this information consists of the number of input, hidden and output nodes, the values of the weights and biases, and the types of activation functions used on the hidden and output layer nodes.

OK, but where do the values of the weights and the biases come from? They’re determined by training the model. Training is using a set of data that has known input values and known, correct output values, and applying an optimization algorithm such as back-propagation to minimize the difference between computed output values and known, correct output values.

There are many other kinds of ML models, such as decision trees and naive Bayes, but the general principles are the same. When using a neural network code library such as Microsoft CNTK or Google Keras/TensorFlow, the program that trains an ML model will save the model to disk. For example, CNTK and Keras code resembles:

mp = ".\\Models\\iris_nn.model"
model.save(mp, format=C.ModelFormat.CNTKv2)  # CNTK
model.save(".\\Models\\iris_model.h5")  # Keras

ML libraries also have functions to load a saved model. For example:

mp = ".\\Models\\iris_nn.model"
model = C.ops.functions.Function.load(mp)  # CNTK
model = load_model(".\\Models\\iris_model.h5")  # Keras

Most neural network libraries have a way to save just a model’s weights and biases values to file (as opposed to the entire model).

Deploying a Standard ML Model to an IoT Device

The image in Figure 1 shows an example of what training an ML model looks like. I used Visual Studio Code as the editor and the Python language API interface to the CNTK v2.4 library. Creating a trained ML model can take days or weeks of effort, and typically requires a lot of processing power and memory. Therefore, model training is usually performed on powerful machines, often with one or more GPUs. Additionally, as the size and complexity of a neural network increases, the number of weights and biases increases dramatically, and so the file size of a saved model also increases greatly.

For example, the 4-5-3 iris model described in the previous section has only (4 * 5) + 5 + (5 * 3) + 3 = 43 weights and biases. But an image classification model with millions of input pixel values and hundreds of hidden processing nodes can have hundreds of millions, or even billions, of weights and biases. Notice that the values of all 43 weights and biases of the iris example are shown in Figure 1:

[[ 0.2680 -0.3782 -0.3828  0.1143  0.1269]
 [ 0.3954 -0.4367 -0.4332  0.3880  0.3814]
 [-0.5503  0.6453  0.6394 -0.6454 -0.6300]
 [-0.322   0.4035  0.4163 -0.3074 -0.3112]]
 [ 0.1164 -0.1567 -0.1604  0.0810  0.0822]
[[ 0.7552 -0.0001 -0.7706]
 [-0.7297 -0.2048  0.9301]
 [-0.6733 -0.2512  0.9167]
 [ 0.9367 -0.4276 -0.5134]
 [ 0.9381 -0.3728 -0.5667]]
 [-0.0466  0.4528 -0.4062]

So, suppose you have a trained ML model. You want to deploy the model to a small, weak, IoT device. The simplest solution is to install onto the IoT device the same neural network library software you used to train the model. Then you can copy the saved trained model file to the IoT device and write code to load the model and make a prediction. Easy!

Unfortunately, this approach will work only in relatively rare situations where your IoT device is quite powerful—perhaps along the lines of a desktop PC or laptop. Also, neural network libraries such as CNTK and Keras/TensorFlow were designed to train models quickly and efficiently, but in general they were not necessarily designed for optimal performance when performing input-output with a trained model. In short, the easy solution for deploying a trained ML model to an IoT device on the edge is rarely feasible.

The Custom Code Solution

Based on my experience and conversations with colleagues, the most common way to deploy a trained ML model to an IoT device on the edge is to write custom C/C++ code on the device. The idea is that C/C++ is almost universally available on IoT devices, and C/C++ is typically fast and compact. The demo program in Figure 3 illustrates the concept.

Simulation of Custom C/C++ IO Code on an IoT Device
Figure 3 Simulation of Custom C/C++ IO Code on an IoT Device

The demo program starts by using the gcc C/C++ tool to compile file test.c into an executable on the target device. Here, the target device is just my desktop PC but there are C/C++ compilers for almost every kind of IoT/CPU device. When run, the demo program displays the values of the weights and biases of the iris flower example, then uses input values of (6.1, 3.1, 5.1, 1.1) and computes and displays the output values (0.0321, 0.6458, 0.3221). If you compare Figure 3 with Figures 1 and 2, you’ll see the inputs, weights and biases, and outputs are the same (subject to rounding error).   

Demo program test.c implements only the neural network input-output process. The program starts by setting up a struct data structure to hold the number of nodes in each layer, values for the hidden and output layer nodes, and values of the weights and biases:

#include <stdio.h>
#include <stdlib.h>
#include <math.h>  // Has tanh()
typedef struct {
  int ni, nh, no;
  float *h_nodes, *o_nodes;  // No i_nodes
  float **ih_wts, **ho_wts;
  float *h_biases, *o_biases;
} nn_t;

The program defines the following functions:

construct(): initialize the struct
free(): deallocate memory when done
set_weights(): assign values to weights and biases
softmax(): the softmax function
predict(): implements the NN IO mechanism
show_weights(): a display helper

The key lines of code in the demo program main function look like:

nn_t net;  // Neural net struct
construct(&net, 4, 5, 3);  // Instantiate the NN
float wts[43] = {  // specify the weights and biases
  0.2680, -0.3782, -0.3828, 0.1143, 0.1269,
. . .
 -0.0466, 0.4528, -0.4062 };
set_weights(&net, wts);  // Copy values into NN
float inpts[4] = { 6.1, 3.1, 5.1, 1.1 };  // Inputs
int shownodes = 0;  // Don’t show
float* probs = predict(net, inpts, shownodes);

The point is that if you know exactly how a simple neural network ML model works, the IO process isn’t magic. You can implement basic IO quite easily.

The main advantage of using a custom C/C++ IO function is conceptual simplicity. Also, because you’re coding at a very low level (really just one level of abstraction above assembly language), the generated executable code will typically be very small and run very fast. Additionally, because you have full control over your IO code, you can use all kinds of tricks to speed up performance or reduce memory footprint. For example, program test.c uses type float but, depending on the problem scenario, you might be able to use a custom 16-bit fixed-point data type.

The main disadvantage of using a custom C/C++ IO approach is that the technique becomes increasingly difficult as the complexity of the trained ML model increases. For example, an IO function for a single hidden layer neural network with tanh and softmax activation is very easy to implement—taking only about one day to one week of development effort, depending on many factors, of course. A deep neural network with several hidden layers is somewhat easy to deal with—maybe a week or two of effort. But implementing the IO functionality of a convolutional neural network (CNN) or a long, short-term memory (LSTM) recurrent neural network is very difficult and would typically require much more than four weeks of development effort.

I suspect that as the use of IoT devices increases, there will be efforts to create open source C/C++ libraries that implement the IO for ML models created by different neural network libraries such as CNTK and Keras/TensorFlow. Or, if there’s enough demand, the developers of neural network libraries might create C/C++ IO APIs for IoT devices themselves. If you had such a library, writing custom IO for an IoT device would be relatively simple.

The Microsoft Embedded Learning Library

The Microsoft Embedded Learning Library (ELL) is an ambitious open source project intended to ease the development effort required to deploy an ML model to an IoT device on the edge (microsoft.github.io/ELL). The basic idea of ELL is illustrated on the left side of Figure 4.

The ELL Workflow Process, High-Level and Granular
Figure 4 The ELL Workflow Process, High-Level and Granular

In words, the ELL system accepts an ML model created by a supported library, such as CNTK, or a supported model format, such as open neural network exchange (ONNX). The ELL system uses the input ML model and generates an intermediate model as an .ell file. Then the ELL system uses the intermediate .ell model file to generate executable code of some kind for a supported target device. Put another way, you can think of ELL as a sort of cross-compiler for ML models.

A more granular explanation of how ELL works is shown on the right side of Figure 4, using the iris flower model example. The process starts with an ML developer writing a Python program named iris_nn.py to create and save a prediction model named iris_cntk.model, which is in a proprietary binary format. This process is shown in Figure 1.

The ELL command-line tool cntk_import.py is then used to create an intermediate iris_cntk.ell file, which is stored in JSON format. Next, the ELL command-line tool wrap.py is used to generate a directory host\build of C/C++ source code files. Note that “host” means to take the settings from the current machine, so a more common scenario would be something like \pi3\build. Then the cmake.exe C/C++ compiler-build tool is used to generate a Python module of executable code, containing the logic of the original ML model, named iris_cntk. The target could be a C/C++ executable or a C# executable or whatever is best-suited for the target IoT device.

The iris_cntk Python module can then be imported by a Python program (use_iris_ell_model.py) on the target device (my desktop PC), as shown in Figure 5. Notice that the input values (6.1, 3.1, 5.1, 1.1) and output values (0.0321, 0.6457, 0.3221) generated by the ELL system model are the same as the values generated during model development (Figure 1) and the values generated by the custom C/C++ IO function (Figure 3).

Simulation of Using an ELL Model on an IoT DeviceFigure 5 Simulation of Using an ELL Model on an IoT Device

The leading “(py36)” before the command prompts in Figure 5 indicate I’m working in a special Python setting called a Conda environment where I’m using Python version 3.6, which was required at the time I coded my ELL demo.

The code for program use_iris_ell_model.py is shown in Figure 6. The point is that ELL has generated a Python module/package that can be used just like any other package/module.

Figure 6 Using an ELL Model in a Python Program

# use_iris_ell_model.py
# Python 3.6
import numpy as np
import tutorial_helpers   # used to find package
import iris_cntk as m     # the ELL module/package
print("\nBegin use ELL model demo \n")
unknown = np.array([[6.1, 3.1, 5.1, 1.1]],
  dtype=np.float32)
np.set_printoptions(precision=4, suppress=True)
print("Input to ELL model: ")
print(unknown)
predicted = m.predict(unknown)
print("\nPrediction probabilities: ")
print(predicted)
print("\nEnd ELL demo \n"

The ELL system is still in the very early stages of development, but based on my experience, the system is ready for you to experiment with and is stable enough for limited production development scenarios.

I expect your reaction to the diagram of the ELL process in Figure 4 and its explanation is something like, “Wow, that’s a lot of steps!” At least, that was my reaction. Eventually, I expect the ELL system to mature to a point where you can generate a model for deployment to an IoT device along the lines of:

source_model = ".\\iris_cntk.model"
target_model = ".\\iris_cortex_m4.model"
ell_generate(source_model, target_model)

But for now, if you want to explore ELL you’ll have to work with several steps. Luckily, the ELL tutorial from the ELL Web site on which much of this article is based is very good. I should point out that to get started with ELL you must install ELL on your desktop machine, and installation consists of building C/C++ source code—there’s no .msi installer for ELL (yet).

A cool feature of ELL that isn’t obvious is that it performs some very sophisticated optimization behind the scenes. For example, the ELL team has explored ways to compress large ML models, including sparsification and pruning techniques, and replacing floating point math with 1-bit math. The ELL team is also looking at algorithms that can be used in place of neural networks, including improved decision trees and k-DNF classifiers.

The tutorials on the ELL Web site are quite good, but because there are many steps involved, they are a bit long. Let me briefly sketch out the process so you can get a feel for what installing and using ELL is like. Note that my commands are not syntactically correct; they’re highly simplified to keep the main ideas clear.

Installing the ELL system resembles:

x> (install several tools such as cmake and BLAS)
> git clone https://github.com/Microsoft/ELL.git
> cd ELL
> nuget.exe restore external/packages.config -PackagesDirectory external
> md build
> cd build
> cmake -G "Visual Studio 15 2017 Win64" ..
> cmake --build . --config Release
> cmake --build . --target _ELL_python --config Release

In words, you must have quite a few tools installed before starting, then you pull the ELL source code down from GitHub and then build the ELL executable tools and Python binding using cmake.

Creating an ELL model resembles:

> python cntk_import.py iris_cntk.model
> python wrap.py iris_nn_cntk.ell --language python --target host
> cd host
> md build
> cd build
> cmake -G "Visual Studio 15 2017 Win64" .. && cmake --build . --config release

That is, you use ELL tool cntk_import.py  to create a .ell file from a CNTK model file. You use wrap.py to generate a lot of C/C++ specific to a particular target IoT device. And you use cmake to generate executables that encapsulate the original trained ML model’s behavior.

Wrapping Up

To summarize, a machine learning model is all the information needed for a software system to accept input and generate a prediction. Because IoT devices on the edge often require very fast and reliable performance, it’s sometimes necessary to compute ML predictions directly on a device. However, IoT devices are often small and weak, so you can’t simply copy a model that was developed on a powerful desktop machine to the device. A standard approach is to write custom C/C++ code, but this approach doesn’t scale to complex ML models. An emerging approach is the use of ML cross-compilers, such as the Microsoft Embedded Learning Library.

When fully mature and released, the ELL system will quite likely make developing complex ML models for IoT devices on the edge dramatically easier than it is today.


Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products, including Internet Explorer and Bing. Dr. McCaffrey can be reached at jamccaff@microsoft.com.

Thanks to the following Microsoft technical experts who reviewed this article: Byron Changuion, Chuck Jacobs, Chris Lee and Ricky Loynd


Discuss this article in the MSDN Magazine forum