Let Your Friends Know:

The 2002 dystopian thriller Minority Report, set an incredible prescience with all kinds of modernistic tech shown throughout the movie. Fifteen years later the film still holds up, casting an eerie foreshadow on the future of artificial technology. From driverless cars, voice-controlled homes and even personalised ads set in the near-technological future, the movie accurately prophesized many of the special techs that we are already starting to see and use today — most notably ‘Gesture Recognition’.

A still from the 2002 Tom Cruise starrer, Minority Report

Gesture recognition is a means of human-machine interaction using only body actions without the aid of voice.

The concept of recognising gestures using hands and/or other body parts is based on three layers: Detection, Tracking and Recognition. We use special interfaces that can capture these movements, and later use computer vision technology & deep learning algorithms to understand the underlying pattern. Today, they are several gesture-interface products in the market made by Big Tech giants like Intel, Apple & Google for applications in home automation, shopping, virtual/augmented reality gaming, consumer electronics and navigation among others.

While in 2002, at the time of the release of the movie, this market was certainly non-existent — a recent study projects the global gesture recognition in retail to grow by 22.3% from 2018 to 2025 and is expected to be valued at $30.6B by 2025.

How do machines recognise gestures?

A gesture is specifically classified by any physical movement, large or small, that can be interpreted by a motion sensor — anything from the pointing of a finger to a jumping high kick, a nod of the head, or even a pinch or wave of the hand.

Gesture recognition is also a part of ‘touchless user interface’ (TUI)-based applications, meaning they can be controlled without touch. Amazon’s Alexa is a prime example of TUI since it is voice-controlled.

How it Works

How Gesture Recognition Works

A camera system feeds captured image data into the primary sensing device. This device typically calculates the depth of field and tracks the movement of the hand (or other body parts) through 3D space.
Deep Learning algorithms, based on layered neural networks are then trained to identify and correlate meaningful gestures from a comprehensive pre-built library of ‘gestures’.
Each gesture or movement is then matched in real-time to an intended action specific to the end-user’s application.
Once the gesture has been interpreted and matched from the library, the system executes the desired set of actions.

Deep Learning, is the key for a system to learn and decipher body movements and even understand ‘intent’.

Real-World Applications —

Real-Time Video Captioning

A Concept Lab within Rochester Institute of Technology has created a novel way to convert sign language into text, in real-time. The application is intended for deaf users, who can potentially use the tool to communicate with a hearing person seamlessly. Using computer vision and machine learning models, sign language (ASL), can be converted into words that are read on any device screen.

Static Gesture Recognition DataSet

Time-of-Flight for Range Finding

The Time-of-Flight principle (ToF) is a method for measuring the time-based distance between a sensor and an object. Using specially designed sensors, the signal created can have the longest range and highest reading frequency. Sony is making use of this technology in their DeepSensing Solutions for controlling in-car features like regulating temperature, volume control etc.

Sony DepthSensing

Gestural Recognition in Virtual Reality

Swedish Software firm Mano Motion, has created a unique application that leverages computer vision technology and can recognize hand gestures using simple RGB based cameras found in android and iOS smartphones. The app works in AR/VR and Mixed Reality environments for a host of consumer-facing scenarios such as gaming, IoT, and consumer electronics, among others.

Using this tool, users can manipulate and move objects in the virtual environment quite easily. It can also detect hand movement using the app and erstwhile inject them into an animation using Unity’s game engine. The latency is less than 10 milliseconds, while it only uses about 30% of the processing power of an iPhone 6s.

Mano Motion AR Kit

Retail Recommendation Engines using Gestural Feedback

Through advanced AI-enabled gesture recognition, retailers can assess the popularity of an item by learning from shoppers’ facial and hand gestures. A trained model analyses how shoppers react to different products in the store determines the potential for sale from a product. This way, retailers can immediately assess if new items are working in the store or not, and can even predict the best place in the store to position them for better results.

Using gesture recognition and tools like point-and-grab technology, retailers can make use of a lot of predictive data surrounding their customers’ preferences and shopping habits — allowing them to show only those products that the customers are most likely to purchase or engage with.

Future Scope

For what it’s worth, who wouldn’t want to mute their TV by putting your finger close to the lip or simply waving at the stereo to turn it on. If ‘Minority Report’ is any indicator of things to come, we can look forward to gestural recognition applications moving into education, real estate, fashion design, and even law enforcement. The consumer electronics market is set to lead the forecasted demand and growth for this technology, leading to an increase in the number of vendors creating ground-breaking applications in this space.