Inference at the Edge: Simpler models and hardware

[From the last episode: We looked at activation functions and what they’re for.]

We’ve talked about the structure of machine-learning (MLMachine learning (or ML) is a process by which machines can be trained to perform tasks that required humans before. It's based on analysis of lots of data, and it might affect how some IoT devices work.) modelsA simplified representation of something real. We can create models of things in our heads without even realizing we're doing it. Technology often involves models because they let us simplify what would otherwise be extremely detailed, complicated concepts by focusing only on essential elements. and much of the hardwareIn this context, "hardware" refers to functions in an IoT device that are built into a silicon chip or some other dedicated component. It's distinct from "software," which refers to instructions running on a processor. and math needed to do ML work. But there are some practical considerations that mean we may not directly use the pristine model as it was developed during the learning process.

Training in the Cloud

Most ML trainingWith machine learning, a model must be trained in order to perform its function (which is called inference). happens in the cloudA generic phrase referring to large numbers of computers located somewhere far away and accessed over the internet. For the IoT, computing may be local, done in the same system or building, or in the cloud, with data shipped up to the cloud and then the result shipped back down.. That’s where you have the massive amounts of computing necessary to do the task in a reasonable timeframe. There’s no doubt that training requires more work than inferenceFor machine learning, this refers to the process of making decisions using a model. Before the decisions can be made accurately, training must occur. – that is, using the trained model to do work. Part of that is simply the fact that training means running through thousands of samples to get the picture (sometimes literally).

But it’s also the way training works: start where the model is now, guess at a new picture, figure out what’s wrong, and tweak the model accordingly. Do that again for every sample. So that first pass – that guess – is really inference. In addition to that inference work, then, is the whole bit about figuring out what to change in the model and then changing it. So it’s inference-plus-more-work, which is, by definition, more work than inference by itself.

So training overwhelmingly happens in the cloud. Inference also started in the cloud – because the models, as freshly minted after training, require the kind of computing that’s in the cloud. Also, some early applications – like rummaging through all the stuff you’ve handed to companies like Facebook and Google – happens in the cloud anyway.

The Cloud is Far Away

But for other applications, that’s not a convenient way to do things. Let’s take a surveillance camera as an example, where you want to do face recognitionIn the context of machine vision, a task that identifies not just what kind of thing is in the image, but which specific individual it is. to alert if it’s someone other than the people it knows. It takes ML to do that facial recognition, so, if it’s going to happen in the cloud, then the video stream must be sent to the cloud, where the decision is made. The decision can then be sent back to the camera.

There are two problems with this. First, that’s a lot of data going up to the cloud. Lots of such cameras will start clogging the interwebs with raw video streams. The second issue is that it takes time to get to the cloud and back. It doesn’t feel like a lot of time to us, but if someone’s trying to break into the house, you want to know sooner rather than later. Fractions of a second can actually matter. This delay is referred to as latencyThis can have various meanings, but in the context of a pipeline, it’s how long a specific single task takes to get be completed..

What we’d really like is to have the camera itself do the facial recognition right there. That would mean not sending all that video to the cloud, and, depending on how quickly the camera did its job, you could have the answer faster too.

The Edge Wants it Simpler!

But there’s a problem. In the cloud, ML happens on giant, powerful serversA computer with a dedicated purpose. Older familiar examples are print servers (a computer that controls local printing) and file servers (a computer used for storing files centrally). More modern examples are web servers (the computers that handle your web requests when you use your browser) or application servers (computers dedicated to handling the computing needs of a specific application). Servers are often powerful, expensive machines since they have to handle a heavy load., often with special hardware plugged in to make the calculations go even faster. You can’t have that kind of computing systemThis is a very generic term for any collection of components that, all together, can do something. Systems can be built from subsystems. Examples are your cell phone; your computer; the radio in your car; anything that seems like a "whole." next to every camera. You need something small and practical – and, if the camera (or whatever) is battery-powered, then it also needs to go easy on the juice.

This is a huge area right now. Many companies are coming up with new ideas for how to do ML “at the edgeThis term is slightly confusing because, in different contexts, in means slightly different things. In general, when talking about networks, it refers to that part of the network where devices (computers, printers, etc.) are connected. That's in contrast to the core, which is the middle of the network where lots of traffic gets moved around..” Part of it is about architectures for the computing hardware. But there are also changes that, realistically, need to be made to the models themselves to get them to work

For example, raw model training uses what we call real numbers. Basically, that means numbers with decimalThe base-10 counting system that we usually use. Digits can go from 0 to 9. points. Those take more work to compute with than, say, basic integers. So one of the first things that often happens is that the real-numbered weights need to be converted to integers.

That, of course, introduces some error. Just like if you round numbers to the nearest whole number, you’re not going to get the same answers as if you worked with the more precise number. So, after doing this, designers have to go back and confirm that their accuracy when using the new, simplified model is still good enough. If not, they may have to do some more retraining.

Improving Tools

One thing that should be said, however, is that the training tools are evolving rapidly. What I’ve described was the standard practice for a long time. But now, rather than training for the cloud, tweaking, checking accuracy, and then retraining, one may be able to train with the target hardware in mind – which is great if that target is at the edge. It can save a lot of work.

There are several other simplification techniques that can be used, and we’ll talk about two of those next week.

Inference at the Edge

Training in the Cloud

The Cloud is Far Away

The Edge Wants it Simpler!

Improving Tools

Bryon Moyer, Technology Writer

Get In Touch

Additional Resources

About Bryon Moyer

Training in the Cloud

The Cloud is Far Away

The Edge Wants it Simpler!

Improving Tools

Reader Interactions

Leave a Reply Cancel reply

Bryon Moyer, Technology Writer

Get In Touch

Additional Resources

About Bryon Moyer