“Machine learning (ML) has been all the rage in server and mobile applications for years, and now the trend has spread to edge devices and is gaining prominence. Because edge devices require power savings, developers need to learn and understand how to deploy ML models into microcontroller-based systems. ML models running on microcontrollers are often referred to as tinyML. However, deploying a model to a microcontroller is not easy, but it is becoming easier and can be done by developers without any specialized training in a given time.
Author: Jacob Beningo
Machine learning (ML) has been all the rage in server and mobile applications for years, and now the trend has spread to edge devices and is gaining prominence. Because edge devices require power savings, developers need to learn and understand how to deploy ML models into microcontroller-based systems. ML models running on microcontrollers are often referred to as tinyML. However, deploying a model to a microcontroller is not easy, but it is becoming easier and can be done by developers without any specialized training in a given time.
This article explores how embedded environment developers can get started with ML with STMicroelectronics’ STM32 microcontrollers. To this end, the article shows how to convert the TensorFlow Lite for Microcontrollers model to STM32CubeIDE via X-CUBE-AI to create a “Hello World” application.
Introduction to tinyML use cases
TinyML is a growing field that incorporates ML capabilities into resource- and power-constrained devices such as microcontrollers, often employing deep neural networks. These microcontroller devices can then run ML models, doing valuable work at the edge. TinyML in the following use cases is very interesting.
The first use case, common to many mobile and home automation devices, is keyword recognition. With keyword recognition, embedded devices can use a microphone to capture speech and detect pre-trained keywords. tinyML models take a time-series input representing speech and convert it into speech features, usually spectrograms, which contain frequency information over time. The spectrogram is then fed into a neural network trained to detect specific words, and the result is the probability of detecting a specific word. Figure 1 shows an example of this process.
Figure 1: Keyword recognition is an interesting use case for tinyML. Input speech is converted into a spectrogram and fed into a neural network trained to determine if there are pre-trained words. (Image credit: Arm®)
Another tinyML use case of interest to many embedded environment developers is image recognition. The microcontroller captures the image from the camera and feeds it into the pre-trained model. The model can discern what is in the graph. For example, the model can determine if there are cats, dogs, fish, etc. A good example of image recognition used for edges is a video doorbell. Video doorbells can often detect if someone is at the door, or if a package is dropped.
A final very common use case is predictive maintenance with tinyML. Predictive maintenance uses ML to predict equipment status based on anomaly detection, classification algorithms, and predictive models. Likewise, applications range from HVAC systems to factory floor equipment.
While the above three use cases are common for tinyML today, developers can undoubtedly find many more potential use cases. Here is a quick list:
・ Gesture classification
· abnormal detection
・ Analog meter reader
・ Guidance and Control (GNC)
・ Packaging inspection
Regardless of the use case, the best way to start getting familiar with tinyML is to use the “Hello World” application, which helps developers learn and understand the basic process followed to implement and run a most basic system. There are 5 necessary steps to run a tinyML model on an STM32 microcontroller:
1. Capture data
2. Tag data
3. Train the neural network
4. Convert the model
5. Run the model on the microcontroller
Capture, label and train the “Hello World” model
Developers often have many options for capturing and labeling the data needed to train a model. First, there are a large number of online training databases. Developers can search for data collected and tagged by others. For example, for basic image detection there is CIFAR-10 or ImageNet. To train a model to detect smiles in photos, there is also a set of images. Starting with an online data repository is obviously a good choice.
Developers can also generate their own data if the required data is not yet publicly available on the internet. The dataset can be generated using Matlab or other tools. If the data cannot be generated automatically, it can also be done manually. Finally, if you feel that these are too time-consuming, you can also buy datasets on the Internet. Collecting data is often the most exciting and fun option, but also the most labor-intensive.
The “Hello World” example explored here shows how to train a model to generate a sine wave and deploy it on the STM32. This example was put together by Pete Warden and Daniel Situnayake when they were working on TensorFlow Lite for Microcontrollers at Google. This makes the job easier because they’ve put together a simple public tutorial for capturing, labeling, and training models. It can be found on Github (click here); after opening the link, developers should click the “Run in Google Colab” button. Google Colab, short for Google Collaboratory, allows developers to write and execute Python in the browser without configuration and provides free access to Google GPUs.
Browsing the training examples will output two different model files; a model.tflite TensorFlow model quantized for the microcontroller, and a model_no_quant.tflite model without quantization. Quantization can indicate how the activations and biases of a model are stored digitally. After quantization, a smaller model can be obtained, which is more suitable for microcontrollers. Curious readers can see the results of the trained model compared to the actual sine wave results in Figure 2. The output of the model is shown in red. The sine wave output is not perfect, but for the “Hello World” program it works fine.
Figure 2: Comparison of TensorFlow model sine wave predictions and actual values. (Image credit: Beningo Embedded Group)
Choose a development board
Before looking into how to convert a TensorFlow model to run on a microcontroller, you need to choose a microcontroller to deploy in the model. This article will focus on STM32 microcontrollers, as STMicroelectronics has many tinyML/ML tools that can convert and run models well. In addition, STMicroelectronics has a variety of components that are compatible with its ML tools (Figure 3).
Figure 3: Diagram showing the microcontrollers and microprocessor units (MPUs) currently supported by the STMicroelectronics AI ecosystem. (Image credit: STMicroelectronics)
If you have such a development board in your office, it’s perfect for getting a “Hello World” application up and running. However, if you are interested in applications beyond this example, and want to learn about gesture control or keyword recognition, you can choose the STM32 B-L4S5I-IOT01A development kit IoT node (Figure 4).
The development board features an Arm Cortex®-M4 processor of the STM32L4+ series. The processor has 2 MB of flash and 640 KB of RAM, providing ample space for tinyML models. The module also features STMicroelectronics’ MP34DT01 Micro-Electro-Mechanical Systems (MEMS) microphone for application development for keyword recognition, suitable for tinyML use case experiments. Additionally, an onboard LIS3MDLTR three-axis accelerometer, also from STMicroelectronics, can be used for tinyML-based gesture detection.
Figure 4: The STM32 B-L4S5I-IOT01A development kit IoT node with an Arm Cortex-M4 processor, MEMS microphone and triaxial accelerometer is an adaptive tinyML experimentation platform. (Image credit: STMicroelectronics)
Convert and run TensorFlow Lite models with STM32Cube.AI
With a development board that can run tinyML models, developers can now start converting TensorFlow Lite models to run on microcontrollers. A TensorFlow Lite model can run directly on a microcontroller, but requires a runtime environment to process it.
When running the model, a series of functions needs to be performed. These functions first collect sensor data, then sift through it, extract the necessary features, and feed it back to the model. The model outputs results, which are then further filtered, often with some additional action. Figure 5 shows an overview of the process.
Figure 5: How data flows from the sensor to the runtime to the output of the tinyML application. (Image credit: Beningo Embedded Group)
The X-CUBE-AI plugin for STM32CubeMx provides a runtime environment for interpreting TensorFlow Lite models and provides alternative runtime and transformation tools that developers can leverage. The X-CUBE-AI plugin is not enabled by default in the project. However, after creating a new project and initializing the board, under Software Packs->Select Components, there is an option to enable the AI runtime. There are a few options here; make sure you’re using the Application template in this example, as shown in Figure 6.
Figure 6: The X-CUBE-AI plugin needs to be enabled using the application template for this example. (Image credit: Beningo Embedded Group)
When X-CUBE-AI is enabled, the STMicroelectronics X-CUBE-AI category will appear in the toolchain. By clicking on this category, developers can select the model file they created and set the model parameters, as shown in Figure 7. The model can be analyzed via the Analyze button and provides the developer with RAM, ROM and execution cycle information. Developers are strongly encouraged to compare the Keras and TFLite model options. In the smaller example of a sine wave model, there is no significant difference, but one can be seen. Click “Generate code” to generate the project.
Figure 7: The Analyze button provides the developer with RAM, ROM, and execution cycle information. (Image credit: Beningo Embedded Group)
The code generator will initialize the project and build the runtime environment for the tinyML model. But by default, no input is provided to the model. The developer needs to add code to provide the model with an input value — an x value, which the model will interpret and generate a sinusoidal y value. As shown in Figure 8, several pieces of code need to be added to the acquisition_and_process_data and post_process functions.
Figure 8: The code shown will connect the dummy input sensor value to the sine wave model. (Image credit: Beningo Embedded Group)
At this point, the example is ready to run. Note: Add some printf statements to get model output for quick verification. Quick compilation and deployment get the “Hello World” tinyML model running. Taking the model output for a full cycle yields the sine wave shown in Figure 9. It’s not perfect, but it’s great for a first tinyML application. From this, developers can connect the output with a pulse width modulator (PWM) and generate a sine wave.
Figure 9: “Hello World” sine wave model output when running on STM32. (Image credit: Beningo Embedded Group)
ML Tips and Tricks on Embedded Systems
Developers getting started with ML on microcontroller-based systems need to do quite a bit of work to get their first tinyML applications up and running. However, keep in mind a few “tips and tricks” that can simplify and speed up its development:
・Browse the “Hello World” example of TensorFlow Lite for Microcontrollers, including Google Colab files. Take a moment to tune the parameters and see how those parameters affect the trained model.
• Use of quantized models in microcontroller applications. The quantized model is compressed to use uint8_t instead of 32-bit float. Therefore, the model is smaller and performs faster.
・Learn about other examples in the TensorFlow Lite for Microcontrollers repository. Other examples include gesture detection and keyword detection.
・ Take “Hello World” as an example and connect the model output to a PWM and a low pass filter to see the resulting sine wave. Perform runtime experiments, increasing and decreasing the frequency of the sine wave.
・Choose a development board that includes “extra” sensors to try out a wide range of ML applications.
・ While collecting data is fun, it is generally easier to buy or use an open source database to train a model.
Developers who follow these “tips and tricks” can save more time and trouble while keeping their apps safe.
ML has spread to the edge of the network, and systems based on resource-constrained microcontrollers are its main target. The latest tools can transform and optimize ML models to run on real-time systems. As shown, it is relatively easy to implement and run the model on the STM32 development board, but there are complications involved. Although only simple models that generate sine waves are explored, more complex models such as gesture detection and keyword recognition can also be implemented.