Augmented Reality with Planes
In this assignment, you will write a program that displays a virtual (i.e., graphics) object (e.g., a wire-frame cube, a teapot) on a real video sequence. The video sequence will contain a reference pattern (e.g., rectangle) that will be used for estimating the camera position. We will assume that the camera is calibrated internally (i.e., matrix of intrinsic parameters is given) and that the camera does not change zoom during the recording of the video. Either the camera or the object will move (no both). Because you already have the intrinsic parameters, all you need to estimate are the extrinsic parameters (i.e., camera orientation and translation).
The result of the assignment should look like the examples in the following videos:
1. https://www.youtube.com/watch?v=s4pICjMTKMs
2. https://www.youtube.com/watch?v=3QwJZ2hzAUY
In your assignment, you are free to add an animated character or just a rigid object to your augmented scene. If you prefer, just add a simple wireframe cube (i.e., just 8 points in space connected by lines drawn on each image after perspective projection).
The steps to complete the assignment are as follows:
1. Read Chapter 15 of Prince's book. Section 15.2.4 describes the method for estimating the homography transformation between two images (using 4 or more points). Section 15.3 describes the method for estimating the homography transformation when you use exactly 4 points (just an inverse of the matrix instead of a least-squares solution). Section 15.7.1 summarizes the application of homography to augmented reality.
2. Study the steps of the algorithms provided at the end of this document. Each algorithm describes the steps for each of the sections listed in Step 1. Compare the steps of the algorithms with the mathematical description from the corresponding sections to ensure you understand the math.
3. Calibrate your camera to obtain the intrinsic parameters (i.e., Lambda). Record a video sequence of a scene that contains a plane (with at least four detectable points on the plane). As you move the camera, you need to keep the same four points visible all the time. A short movie will be sufficient. Keep in mind that you might need to detect the points manually on each video frame and videos usually record at frame rates of 15-30 frames per second.
4. Detect at least four points on a plane on an object in the scene. We have not seen automatic feature detection yet. So, you can manually detect these points on a set of video frames (e.g., 10 frames). In this assignment, you just want to demonstrate the technique so you can skip frames (don't need to use all frames). Skipping frames will make the task of manually selecting points on each frame less time demanding.
5. Estimate the homography transformation between the scene and the camera (for each video frame in the sequence).
6. Factorize the homography matrix to extract the camera rotation and translation.
7. Compose the pinhole camera to project your 3-D object on each frame of the sequence.
8. Draw the object on each video frame (e.g., draw the image points and link them with lines).
9. Save each augmented frame as an image.
10. Convert the sequence of images. A list of various methods for creating a video from image sequences is provided next.