Just as it sounds, computer vision describes the ability of a digital device to take in visual data-photos, videos even live action-and then actually make sense of it. Today, computer vision is being used in the real world for everything from photo library organization and semi-autonomous cars to analysis of sponsor logos in sports videos and quality inspection on silicon chip factory assembly lines.
But the sky's the limit as far as what computer vision can do, and many lesser-known innovations are currently being developed or even finding themselves out in the real world. Here are just a few:
Look Ma, No Camera: 3D Mapping
Powered by technologies such as Google Tango, the next wave of smartphones will increasingly have 3D sensors in them that will enable them to identify, measure and create 3D maps of the world around them. It's a new capability that Santa Monica-based startup Fantasmo is leveraging to create a platform around crowdsourced 3D spatial mapping. The info can be stored in the cloud, and can then be used for everything from helping new homeowners see how furniture might look in their new home to enabling the makers of Pokémon Go to create augmented reality (AR) content that's informed by an actual, real-time understanding of the world around a player. Combine Project Tango's technology with a live camera, however, and the game will transform the entire room with environment-specific overlays and various Pokémons reacting to that environment on the fly.
For Fantasmo, the value isn't in the 3D scanning itself, but in the platform of crowdsource-mapped 3D spaces that are collected and the ability to control that platform. “We want to be the pipe that this data runs through,” said Fantasmo CEO Jameson Detweiler in a demo at the LDV Vision Summit, where Fantasmo won the start-up competition. “The end game here is a constantly-updating model of the world, where we're actively tracking every person and then feeding that back into a game engine or whatever you want to use. That's literally how you merge the digital world with the physical world. It's the AR out layer.”
Pathologists' Little Helper: Diagnosis via Eyes, Faces and X-Rays
Contrary to popular belief, machine-driven diagnoses aren't really meant to replace doctors or other medical professionals, but rather, to enhance their abilities. The average pathologist, for example, might look at about 500 glass slides, each of which has tens to hundreds of thousands of individual cells on it. All of those cells need to be examined properly, which is virtually impossible by a single human.
This is where computer vision-enabled AI, which can handle massive volumes of images provided it's looking for specific things, changes the game. Studies by PathAI, which builds and trains computer vision models to help pathologists diagnose better, found that accuracy rates on breast cancer lymph node biopsies went from 85 percent to 99.5 percent when comparing human-only to human-plus-computer approaches.
PathAI's machine learning system essentially highlights areas where it sees cancer across those 500 or so slides. “And now a pathologist's job is really to correct or confirm the diagnosis provided by the AI system, then move on to the next case,” said Path AI co-founder and CEO at the recent LDV Vision Summit. “It's a dramatically faster procedure.”
Trucks, Drones and Boats: Autonomous Everything Else
Most of today's transportation headlines are focused on consumer-aimed self-driving cars, which use a mix of cameras, sensors, LiDAR and radar to “see” what's on the road in front of and around them, but just as much research and development is being applied to optimized computer vision capabilities in self-driving trucks, ships and even race cars.
Some experts believe that autonomous cargo vehicles will be ready for primetime before autonomous consumer cars are. In the future, we'll see fleets of accident-proof, self-driving, electric trucks that can cruise 24/7 without fail, perhaps at extra high speed, delivering produce or packages-or, more crucially, 2,000 cases of beer-cross-country in a couple of days. Meanwhile, autonomous boats are already being tested in Boston Harbor by startup Sea Machines, which is working on self-control systems for all kinds of seacraft, from fire boats to cargo ships. Could this mean the end of oil spills?
But it's not all practical. Sports entertainment is also going autonomous. Startup Roborace and Formula E (competitive EV racing) are teaming up to develop self-driving race cars that can not only go 199 miles per hour, but also can actually learn on the track-the more obstacle course-like, the better-making the record-breaking potential exponential, to say the least.
Machines of a Thousand Faces: Computer Vision That Totally Reads You
AI may lack empathy, people skills and emotional intelligence, but some of the tasks that facial recognition networks have been trained to do recently, offer up some basic and specific emotion predictions. Disney, for example, has developed a network that can discern how audiences feel about a movie in 10 minutes or less, just by looking at pictures of their faces. Using four infrared cameras focused on moviegoers' faces in a 400-seat theater, Disney researchers captured around 16-million facial expressions over the course of 150 screenings.
Using a technology known as factorized variational auto-encoders (FVAE), Disney's program makes calculations based on whether a person is, say, laughing or looking scared at the right places, which can then be used for everything from test screenings to recommendation engines on Disney's upcoming Netflix-like streaming video service.
Not to be outdone in the rudimentary machine emotional intelligence game, that other massive all-American brand, Walmart, is developing a facial recognition technology that analyzes customer moods via cameras at checkout lines. If a shopper is annoyed because the cashier is too slow, the system will recognize distress, via facial recognition, and instantly notify other workers to come and help.
But why stop at people reading skills? Animals have faces too, which is why British researchers trained a neural network on 500 pictures of sheep in pain and sheep feeling just fine. Honing in on telltale “pain” signs, such as forward folding ears and squinting eyes, the Sheep Pain Facial Expression Scale learned to predict whether a sheep is hurting just by looking at a picture of its face.
The "Good" Terminator View: Wearables That Really Work
While Google Glass may have failed to woo the masses, the AR-enabled wearable dream hasn't gone away. Even Apple is rumored to be working on a pair of AR-enabled glasses. For now, companies such as Vuzix offer smart glasses with AR-like layers to help tractor drivers operate more efficiently and safely, while Orcam offers wearable glasses that can read text and even identify faces for the visually-impaired via computer vision. Both of these frames are clunky for now, but it's only a matter of time before optical companies like Warby Parker and their ilk offer similar wearables that look just like regular glasses. And, of course, contact lenses may well offer these augmented reality-like experiences.
In the future, the ability to have a dynamic display that reacts to one's environment will allow for truly superhuman capabilities. There's the ability to recall information, or analysis of what's seen-the type of car or tree or building or artwork sitting before you-telephoto or microscopic enhancement of what you see, replaying of events from moments, days or years ago. And let's not forget that augmented reality doesn't just mean cool overlays on a live camera view. It also means, say, real-time heat maps created by thermal imaging cameras, already to be found on many smartphones and portable cameras, could soon find their way into wearables as well.
Illustrations by Andrew Colin Beck