With stereovision technology you can extract 3D information from two cameras, by comparing information about the scene from two vantage points. This works best when you have a calibrated setup so that features in the left image can be found in the right image. The difference in pixels is called disparity and as the geometry is known you can easily convert it to depth positions.
First, both images are normalized to compensate for minor differences in the sensor and optics, and often texture enhancement filters are applied before calculating the actual disparity. Then a mask will be placed for each pixel in both left and right image and via correlation techniques a similarity will be calculated for that specific position.
The next pixel of the right image will be inspected and a new similarity will be calculated. This process will be repeated and a list of similarities will be generated for each left pixel. Yes, this is quite computational!
From the array, the lowest value will be determined which is the disparity value at that specific position. To make the stereo measurement more robust the measurement can be repeated for the right image in respect with the left one. So stereo vision is based on similarities between images, and for that you need to have texture. There are plenty of scenarios however, where you do not have enough texture. Have a look at the following situations:
To overcome this problem often a light source (laser projector) is added to the stereo vision rig that provides random dots as shown below. A great example of a low cost coded light solution is the Kinect (Primesense technology) from Microsoft and yes similar technology is used in the new iPhone X.
A lot of these stereo sensors have dedicated ASICS to compute the disparity images with high framerates (30-60 fps) and some even have a rotating projector to add more texture and thus generate a more accurate and robust pointcloud.
So this technology enhances the stereo measurement, however there are some drawbacks:
- This technology can only be used indoors
- A high intensity light source is used so difference in color will have a negative impact on precision.
- Speckle effect of the laser can cause the pointclouds to look noisy
- There is not a lot of options to validate if the calculated disparity is reliable.
- You can scan a large area in a very short time.
- Independent of texture in the scene
- You can stack multiple sensors as the light sources do not interfere with each other
- Low cost and reliable technology