[From the last episode: We looked at the reasons for all of the multiply-accumulate hardwareIn this context, "hardware" refers to functions in an IoT device that are built into a silicon chip or some other dedicated component. It's distinct from "software," which refers to instructions running on a processor. in machine-learning engines.]
OK, we’ve spent a couple of weeks deep in the math used for machine learningMachine learning (or ML) is a process by which machines can be trained to perform tasks that required humans before. It's based on analysis of lots of data, and it might affect how some IoT devices work. (ML). Now let’s back up a second to look at what this all means in the bigger picture.
Let’s think conceptually about what we’re doing in each layer. If we’re doing ML with images, we’re likely going to have some kind of CNNA type of artificial neural network specifically used for machine-vision applications.. And this thought process will be a super-simplified version of what’s actually going on, since these CNNs can have well over a hundred layers and billions of weights in total. And there’s more than one way to do them.
Cats and Dogs – Well, Not so Much
We often talk about recognizing cats and dogs because they’re so familiar. But stop and think about it: what if someone asked you to describe what it is that makes a dog look like a dog and a cat look like a cat? It’s not so easy, especially since both have long- and short-haired versions, long-and short-tailed versions, and pointy- and flop-eared versions.
Obviously, it can be done, but for the sake of our more intuitive discussion, let’s pick something else. Let’s look at discriminating between a spider and an insect. The basic things we want to look for are:
- Does it have eight legs or six?
- Does it have a two-part or three-part body?
So our first layer will consist of neurons that look at different ranges of pixels of the image. By looking for edges and shapes, we’ll start to see evidence of segmented lines for legs and ovals or circles for body segments. We might need a few more layers just to connect the pieces to look like legs and to identify where body segments are.
On some layer, then, we could have neurons that flag segmented lines and roundish areas for the body. Once we’ve found that, we’ll want layers that refine whether or not the segmented lines qualify as legs (do they connect together in the manner of a leg?) or how many blob shapes there are. Are they connected together to make one elongated body?
From Lines to Legs
At this point, then, we can add more layers to figure out how many legs there are. Once we can count the legs and the body blobs, then we can make some educated guesses. In the simple decision of “spider” vs. “insect,” it’s actually really easy: if there are eight legs, it’s a spider; if there are six, it’s an insect.
But the image may not be clear. What if a couple legs are out of view based on the angle of the image? A six-legged thing might still be a spider. This is where the body parts help. If there are three segmented body parts in addition to six legs, then it’s pretty likely that it’s an insect. So you might think of having a layer that has a neuron that indicates six legs, another that indicates eight legs, another that indicates three body segments, and one more indicating two body segments.
Your final layer would give the final verdict. If eight legs and two body parts, then you’ve got a spider; if six legs and three parts, an insect. But it’s not going to flag it as yes or no; it’s going to give a probability. In the 8/2 case, it might say 90% likely a spider and 15% likely and insect, or vice versa for the 6/3 case. What if it thinks it sees six legs and two parts? Then it might give a low percentage to both spider and insect.
Picking and Choosing What to Pay Attention To
So where does the multiplication come from? You’ve got these neurons that, for instance, are looking for segmented lines as the legs. In layers before that, you’ve got pieces of lines that haven’t been “connected” into legs yet. In a neuron that’s going to identify a leg, it will look at the prior neurons that have leg parts. The output of those neurons will be important in finding legs, so you want to give those neurons more impact – or more weight – when determining legs.
So you can do three things with the various neuron inputs:
- You can multiply them by a big number – you would do that with leg parts, since they look like what you’re looking for.
- You can multiply them by a big negative – if you had a segment that was highly curved, you might decide that that would pretty much rule out its being a leg. You might think of this as a disqualifier.
- Or you can ignore them – like for a region of the image that has no discernible line of any shape. In some cases, this might mean that you don’t even have a connection to earlier neurons that show no lines, or you could multiply by a really small weight to indicate that this particular image feature isn’t very important to determining where the legs are.
Weight, Weight!
This is where the weights come in, only, as we’ve seen, they’re not simple numbers, but matricesAn array of numbers, having both rows and columns of numbers. Widely used for many complex tasks.. That detail aside, the purpose of the weights is to determine what to pay attention to and what to ignore for a specific thing. That “thing” could be as minor as deciding if something contains a piece of a leg segment, or as final as deciding what the final outcome is. During the trainingWith machine learning, a model must be trained in order to perform its function (which is called inference). process, the modelA simplified representation of something real. We can create models of things in our heads without even realizing we're doing it. Technology often involves models because they let us simplify what would otherwise be extremely detailed, complicated concepts by focusing only on essential elements. “figures out” what the weights should be to give good accuracy.
Bear in mind that I’ve totally made up this example to be super simple; a specialist might read this and say that it’s way too simplistic, and they may well be right. But if you get the gist, then that’s all I’m aiming for.
You Could Take it Further
Having decided whether it’s a spider or insect, you might then add more layers to decide which type of spider or insect it is. Now you have lots of features and colors to look more deeply into, and it isn’t always easy. The 2- or 3-segment thing isn’t always so cut and dried. And, believe it or not, caterpillars – with all of those legs (or so it would seem) are still insects because only six of the legs are truly legs; the others are only leg-like.
So, depending on the angle of view, you can see why it can be a lot of work, with lots and lots of neurons, required to do such sophisticated identification. But each step of the way, you’re identifying something from farther back that’s important for a given neuron. And it’s multiplication by weights that helps you pay attention to the important bits and ignore the unimportant ones.
Leave a Reply