In this assignment, I calculated homographies of user-defined points to rectify images and blend multiple images together to create mosiacs! Then, in Part B, I implemented adaptive non-maximal surpression on Harris corners to create MOPS descriptors, and, discriminating them with RANSAC, implemented automatic mosiacing.
After shooting some images (all from my Canon Rebel T3), we can recover its homography matrix with at least four user defined points. This matrix is a projective transformation, as opposed to the affine ones that needed 3 points we used in the face warping project. I shot some square things around my co-op, Cloyne, at an angle. For the coordinates of the final image, I assumed they were square; if they clearly weren't, I assumed they were just the edges of the image canvas.
This was taken with my Nexus 5 as fast testing.
My favorite one in Cloyne.
Here's a meta move: rectification of an image that was formerly stitched using my phone's mosiac feature. (I painted this!)
Due to the cylindrical projection of my phone's mosiac feature, this result doesn't look that great.
Throwback to Proj 2.
After we know rectification works, we can now create mosiacs! For my mosiacs of three images, I warped the left and right images to the center one. I used a forward warp to calculate the corners of the final canvas, and I did an inverse transform, like in Proj 5, to extract the pixel color values. Finally, I implemented linear blending to smooth the edges between each individual mosiac image using bwdist.
I will demonstrate the process with the dining area of our sweet Airbnb in Charlotte, NC. The majority of time this assignment was out, I was there for UIST. At least it made taking photos off Berkeley's campus easy. Below are the original 3 shots.
I then selected corresponding points by hand...
Finally, inverse warping transforms and combines the three original images into a mosiac. With hard edges, however, you can see clear overlap between individual photos.
But with linear blending (technically feathering), the hard edges disappear. Sadly, this is not the best example: I think I both moved my camera, slightly, and also I didn't exactly shoot a "planar wall" (the hanging light post, for example, was pretty close to me and thus we see it kind of blurry.)
This mosiac of the Lib-Ed room of Cloyne is more successful, with the exception of some ghosting around the piano on the wall.
Finally, here is a shot of Freedom Park in Charlotte, NC. There is a lot of ghosting on the branch, since it was harder for me to define points by hand on nature-like scenes. Hopefully, part 2 will fix this!
All Harris points.
As you can tell, there are a lot of Harris corners. What if we want to limit them 500 points? We could just get the strongest corners, but then they would be clustered around edges. With ANMS, we can choose strong but well spaced corners. We do this by calculating a minimal supression radius, the minimum distance of 1 point to another point with a higher corner strength, ordering all the radiuses, and taking the 500 largest radii.
Now that we have points, we put descriptors over them. First blur the image, then take a 40x40 box centered around a descriptor. We subsample this box every 5 pixels, for a final 8x8 descriptor. Finally, we normalize these descriptors to make them invariant to intensity changes.
After we have descriptors, we use Lowe's technique to match them. The idea is pretty simple: we look at the two closest matches (by SSD) of a feature descriptor, and if the ratio of them is higher than a certain threshold (I used 0.4, per the paper figure), we keep the feature. This is because a good descriptor doesn't only have to be good, it has to be better than the next option or you don't use it at all.
Finally, we can use RANdom SAmple Consensus to compute "inliers" to take the final homography. RANSAC chooses 4 out of all the matched points randomly, and computes a homography using them. It then applies this homography to the rest of the points, and the resultant points are inliers if their SSD is less than a certain epsilon--in this case, I used 10. Over 1,000 iterations, I chose the best homography (aka produced the most inliers), and used it to warp the entire image, as like in part A.
Original hand-matched Cloyne
Automatically aligned Cloyne! Notice how there is basically no ghosting, because I apparently suck at picking points. Good thing Computer Vision doesn't.
Original hand-matched Airbnb
Automatically aligned Airbnb! There is still ghosting on the lamp because it was really close to the camera and I suck at taking mosiac friendly photos, but the chair is much better.
Original hand-matched Freedom Park
Automatically aligned park in Charlotte! Notice how there is no ghosting on the branch. Fun fact: before I slightly downsampled the original images, I had ~32,000 Harris corners to run through ANMS (my bottleneck), which means the code would've taken 66h hours to finish. (With 5,000 corners it takes 2 minutes).
I found some photos from one of the many days I went to London when I studied abroad in Cambridge two summers ago. London has a lot of billboards/signs/adverts, but also a lot of parks. I put some green imagery on these billboards.
Notice the half advert.
But it would be cuter with a duck.
In Photoshop so it looks more like a bad LCD screen.
My descriptors are invariant to rotations, as you can tell from this mosiac of my sister's desk I constructed by attempting to rotate my camera between shots.
Inspired by the lecture in class, and since I'm home for Thanksgiving, I created my own multi perspective panorama/Hockney image of my kitten, Emmet. I stiched together 10 pictures; you can see the originals here.
Inspired by the data driven lectures, I mined Flickr for some photographs of Half Dome. The program managed to correctly detect and stich together three photographs from different times, to a cool effect.
Time changing Half Dome.
While I didn't think Part A was super cool, I enjoyed part B more, especially since automatically aligned mosiacs were so much better than my hand-defined mosiacs. It made me trust in computer vision, and realize that a lot of feature detection/matching things are just a bit of clever math.