Stereo Vision Description
The diagram below is a simple version of the math. Also this article explains it further. http://www.dis.uniroma1.it/~iocchi/stereo/triang.html
What I’m illustrating here is if the object or pixel in question lies directly in the center of the field of view of camera 1(cam1), it makes a triangle, distance Z = Tan(A)*D. To determine the distance from Cam1 to the the object (shown here as a red dot, the pink dots illustrate what happens to angle A when the object is close or farther) we have to know the distance between the cameras, D and the angle to the pixel or object A.
Angle A can be determined if you know the field of view (FOV) of the camera, the format size of the camera sensor, and the pixel in cam2 that the object is incident upon. (more on camera format, field of view etc.) Basically you just take the entire field of view of the camera which is usually given as an angle, typical point and shoots are ~60 degrees. Then you take the sensor size (number of pixels) for example a 6 mega pixel might have 2500 x 2500 pixels. So, if you take 60degrees/2500 pixels, each pixel is 0.024 degrees, therefore you can count the number of pixels to the object to determine the angle A. This definition is a little simplified and ignores things like lens distortion and the format size vs focal length etc, but for purposes here I think you get the point.
Angle A can be determined if you know the field of view (FOV) of the camera, the format size of the camera sensor, and the pixel in cam2 that the object is incident upon. (more on camera format, field of view etc.) Basically you just take the entire field of view of the camera which is usually given as an angle, typical point and shoots are ~60 degrees. Then you take the sensor size (number of pixels) for example a 6 mega pixel might have 2500 x 2500 pixels. So, if you take 60degrees/2500 pixels, each pixel is 0.024 degrees, therefore you can count the number of pixels to the object to determine the angle A. This definition is a little simplified and ignores things like lens distortion and the format size vs focal length etc, but for purposes here I think you get the point.
These pictures should help to illustrate:
You can see in camera one the nickle stays on the same X pixel, 130, independent of the distance from cam1. But in camera 2, the X pixel changes from, 57 at 10 inches, 95 at 13 inches, and 130 at 17inches. In order to find the distance, all you need to know is the X value of the pixel in cam 1 and cam 2, and the distance between the two cameras.
It gets a little more complicated when the object is off axis in both cameras, but the same type of math still applies except it now involves both angle A and B.
It gets a little more complicated when the object is off axis in both cameras, but the same type of math still applies except it now involves both angle A and B.
It gets even further complicated in three dimensions but I don’t have a diagram for that.