[From the last episode: We looked at the difference between trainingWith machine learning, a model must be trained in order to perform its function (which is called inference). and inferenceFor machine learning, this refers to the process of making decisions using a model. Before the decisions can be made accurately, training must occur. for ANNsA type of neural network that’s loosely inspired by biological neurons, but operates very differently..]
One can do many, many things with AIA broad term for technology that acts more human-like than a typical machine, especially when it comes to "thinking." Machine learning is one approach to AI.. But of all of the applications, vision has been the most… well, visible. Another popular application – so-called predictive analyticsRefers to rummaging though lots of data to learn things. This would be done by an IoT device or service provider. For instance, looking at what time of day a gadget gets used most, or seeing if any devices have failed. – is used by the social media and other online companies to predict what you’d like to see or buy or do. But it works in the dark – the much-maligned “algorithmz” that Google and Facebook and their ilk use.
Vision applications, on the other hand, are imbuing devices – many of them IoTThe Internet of Things. A broad term covering many different applications where "things" are interconnected through the internet. devices – with a greater ability both to see and, more importantly, to “understand” what they see. This application also gives us an opportunity to dip our toes into the style of math that’s used in pretty much all commercial AI applications today. We won’t get into that today; instead, we’ll continue to pave the way towards that.
Is Anything There?
There are a couple of levels that a vision systemThis is a very generic term for any collection of components that, all together, can do something. Systems can be built from subsystems. Examples are your cell phone; your computer; the radio in your car; anything that seems like a "whole." might aspire to: detectionIn the context of machine vision, a task that identifies the particular type of object in the image – chair, person, cat, etc. and recognitionIn the context of machine vision, a task that identifies not just what kind of thing is in the image, but which specific individual it is.. Let’s start with detection. That answers the question, “Is <something> there?” That “something” could be any object, animal, or person. You don’t know which specific object, animal, or person it is, but you know that it’s there.
Let’s take a security camera as an example. For a smart camera protecting a property from unauthorized people, it needs to know if a person shows up. A good camera will only alert if it’s a person, not a dog or raccoon or deer. If it’s really good, it will notice if a person is trying to crawl like a dog to bypass the camera.
This is an example of detection. While it might sound simplistic, it can be anything but. Imagine a self-driving car with lots of cameras surveying 360 degrees around the car. It needs to notice other cars on the road, pedestrians, the ball that bounced into the road, the dog that ran into the road, the street sign showing a change in speed limit, and all of the other possible items that affect decisions the car has to make.
It doesn’t care which person the pedestrian is; it doesn’t care about the make or model of the car it’s behind; it simply needs to know that it’s there.
Who Are You?
The next step is recognition. Remember that security camera? Well, you might not want it to alert on every person. If it’s outside your home and you’re returning home and it sees you, you’d probably prefer that it would give you a pass rather than calling the cops.
So here, it’s not enough simply to decide that a person is present; it also needs to decide which person it is. That might seem to be a harder problem, although it may be handled differently from the detection problem.
For detection, you need to consider all manner of characteristics: upright or on all fours? Furry or not? Dog-looking? Cat-looking? You’re starting with nothing and trying to decide what this thing is – not an easy problem.
With recognition, you’re not starting with nothing: you’re starting with the knowledge that this is, for example, a human. You just don’t know which human. But, for example, you can identify different facial features – distances between nose and eyes and ears and whatnot – to create a map of the face. If you get enough of these features, then everyone will be unique. (OK, except for twins and doppelgangers – they become a harder problem.)
Wrangling Identification
So, once you know you have a human, you can measure all of the features of the human in the frame to decide if the numbers for each feature are the same as the numbers for some person you’re trying to identify. Or you can go through a list of feature numbers to see if anyone fits.
This is still a hard problem, since the reference image from which the original features were derived may be very different from the image you’re seeing right now. The lighting, angle, expression, and many other things may be different. So you have to be able to figure out which features remain invariant through all of these different settings so that you don’t get thrown off.
If it’s even smarter, then it can figure out who it is – and whether they’re smiling or angry.
These are things that are very hard to program in “manually.” It’s really difficult for people to determine what features to use here, while a thorough training process will let the machineIn our context, a machine is anything that isn't human (or living). That includes electronic equipment like computers and phones. figure it out for itself.
One thing to keep in mind: machine visionTechnology that gives machines the ability to see things and make decisions based on what they see. involves both still images and video. In the case of video, each frame effectively becomes a still image. The challenge for a machine vision system, then, is to process each frame as quickly as the frames arrive. Not always easy.
Next week, we’ll look at different styles of neural networksA type of conceptual network organized in a manner inspired by our evolving understanding of how the brain works., zeroing in on the particular style that’s used for vision.
Leave a Reply