Sight is a key factor in enabling robots to assist humans in their everyday lives. Equipped with various sensor systems, they not only detect objects, but also precisely measure their alignment and position. The technology is by now so advanced that robots are even able to view an object down to its molecular level – and so see much more than their human model, the eye.
In order to assist people as flexibly and autonomously as possible, a robot must be able to recognise objects and detect their exact position. Only in that way can it conceivably execute commands such as “bring the glass”, or “hold the component”. To achieve those goals, robot manufacturers are employing the full range of sensors that are familiar from the industrial image processing field – from ultrasonic sensors to laser scanners.
The main systems in use, however, are various types of cameras. Even “simple” 2D solutions enable contours to be traced and objects identified. They come up against their limits, however, when items are stacked in an undefined way. And with just “one eye” it is impossible to gather height data – yet that is important for a robot in order to pick up an object and to judge the position of its own gripper relative to the object.
Seeing in three dimensions
That is why – as robots become increasingly flexible and mobile – systems providing a three-dimensional view are gaining in significance. One method frequently employed nowadays to provide robots with such 3D vision is based on nature: Like a human being, the machine has two eyes in the form of two offset cameras. Both cameras capture the same image, but owing to the offset their perspectives are different. An electronic evaluation unit in the stereo camera calculates the distance to the viewed object by way of this parallax shift. At present, camera systems use either CCD or CMOS sensors to capture the light signals. The trend is shifting clearly towards CMOS sensor technology, however. It is virtually glare-free, features high temperature resistance and low power consumption, while offering comparable image quality, and is also cheaper to produce.
Measuring and imaging in one
Three-dimensional imaging by stereo camera is complex and costly, however. So ToF (Time-of-Flight) technology is increasingly being used to provide robots with 3D vision. For this, a sensor chip both captures the image of an object and at the same time measures how far away it is. The core feature of ToF technology is that it measures the time the light takes to travel from the source to the object and back to the camera. The sensors used thus deliver two data types per pixel: an intensity value (grey scale) and a distance value (depth of field). This results in a pixel cloud comprising several thousand pixels, depending on the chip, from which the associated software can very precisely compute the nature and distance of an object. The cameras have their own active exposure unit which emits laser or infrared light. This means ToF systems are independent of the ambient light. As opposed to 3D stereo imaging, the distance is not calculated. Instead, an exact measurement is performed pixel by pixel. This means that ToF cameras work at very high speed. The resolution of the resultant image is lower than from stereo cameras, however. Consequently, ToF systems are frequently coupled with stereo systems in order to utilise the benefits of both, and produce an optimally dense and exact depth map. State-of-the-art camera systems also enable motion vectors to be captured. To do this, the position of a structure in two consecutively captured images is compared by software in order to obtain movement data.
Faster with micro-mirrors
Another method utilises the attributes of MEMS technology: DLP (Digital Light Processing) systems consist of a chip with several million microscopically small mirrors. Each micro-mirror is less than one fifth the width of a human hair. Each mirror can be activated individually, switching several thousand times per second. This means a precisely structured light pattern can be reflected onto an object from a light source. By projecting a series of such light patterns onto an object, and recording the distortion of the light by the object using sensors or cameras, a very detailed 3D pixel cloud can be created. Thanks to their high switching speed, the large number of grey scales and the ability to capture light in the visible range as well as in the UV and infrared range, 3D solutions for optical measurement using DLP technology are faster and more precise than the conventional solutions.
Viewing chemical properties
3D systems featuring hyperspectral image processing are also relatively new innovations. They use more than 100 different wavelengths to analyse an object. Broken down into their spectra, they are reflected differently by each material with its specific chemical and molecular properties. So every object has a specific spectral signature; a unique fingerprint based on which it can be identified. This enables really profound insights down to the molecular level of an object. In achieving this, robots have surpassed their role model – because for humans X-ray vision is still science-fiction.