Image Mosiacing (Pt 1) Jingyi Li, cs194-bx

In this assignment, I calculated homographies of user-defined points to rectify images and blend multiple images together to create mosiacs! Then, in Part B, I implemented adaptive non-maximal surpression on Harris corners to create MOPS descriptors, and, discriminating them with RANSAC, implemented automatic mosiacing.


Image rectification

After shooting some images (all from my Canon Rebel T3), we can recover its homography matrix with at least four user defined points. This matrix is a projective transformation, as opposed to the affine ones that needed 3 points we used in the face warping project. I shot some square things around my co-op, Cloyne, at an angle. For the coordinates of the final image, I assumed they were square; if they clearly weren't, I assumed they were just the edges of the image canvas.


Sarah Palin Mural

This was taken with my Nexus 5 as fast testing.

Rectified Palin

Cow Mural

My favorite one in Cloyne.

Rectified Cow

Steven Universe Mural

Here's a meta move: rectification of an image that was formerly stitched using my phone's mosiac feature. (I painted this!)

Rectified Mural

Due to the cylindrical projection of my phone's mosiac feature, this result doesn't look that great.

Camera Obscura Photo

Throwback to Proj 2.

Rectified Soda wall / Jacobs' windows

Image mosiacing

After we know rectification works, we can now create mosiacs! For my mosiacs of three images, I warped the left and right images to the center one. I used a forward warp to calculate the corners of the final canvas, and I did an inverse transform, like in Proj 5, to extract the pixel color values. Finally, I implemented linear blending to smooth the edges between each individual mosiac image using bwdist.

I will demonstrate the process with the dining area of our sweet Airbnb in Charlotte, NC. The majority of time this assignment was out, I was there for UIST. At least it made taking photos off Berkeley's campus easy. Below are the original 3 shots.


I then selected corresponding points by hand...

Finally, inverse warping transforms and combines the three original images into a mosiac. With hard edges, however, you can see clear overlap between individual photos.

But with linear blending (technically feathering), the hard edges disappear. Sadly, this is not the best example: I think I both moved my camera, slightly, and also I didn't exactly shoot a "planar wall" (the hanging light post, for example, was pretty close to me and thus we see it kind of blurry.)

This mosiac of the Lib-Ed room of Cloyne is more successful, with the exception of some ghosting around the piano on the wall.

Finally, here is a shot of Freedom Park in Charlotte, NC. There is a lot of ghosting on the branch, since it was harder for me to define points by hand on nature-like scenes. Hopefully, part 2 will fix this!

Part B: Harris Corners

Now we're going to work on automatically recovering homographies for image mosiacing! We shall be implementing MOPS - Multi-Scale Oriented Patches (MSOP, though?), as described in this paper. The first step is to use Harris corners to detect, well, the corners of an image.


All Harris points.

Adaptive Non-Maximal Suppression

As you can tell, there are a lot of Harris corners. What if we want to limit them 500 points? We could just get the strongest corners, but then they would be clustered around edges. With ANMS, we can choose strong but well spaced corners. We do this by calculating a minimal supression radius, the minimum distance of 1 point to another point with a higher corner strength, ordering all the radiuses, and taking the 500 largest radii.


ANMS points.

Feature Descriptors & Matching

Now that we have points, we put descriptors over them. First blur the image, then take a 40x40 box centered around a descriptor. We subsample this box every 5 pixels, for a final 8x8 descriptor. Finally, we normalize these descriptors to make them invariant to intensity changes.

After we have descriptors, we use Lowe's technique to match them. The idea is pretty simple: we look at the two closest matches (by SSD) of a feature descriptor, and if the ratio of them is higher than a certain threshold (I used 0.4, per the paper figure), we keep the feature. This is because a good descriptor doesn't only have to be good, it has to be better than the next option or you don't use it at all.


Matched points.

RANSAC

Finally, we can use RANdom SAmple Consensus to compute "inliers" to take the final homography. RANSAC chooses 4 out of all the matched points randomly, and computes a homography using them. It then applies this homography to the rest of the points, and the resultant points are inliers if their SSD is less than a certain epsilon--in this case, I used 10. Over 1,000 iterations, I chose the best homography (aka produced the most inliers), and used it to warp the entire image, as like in part A.


Original hand-matched Cloyne

Automatically aligned Cloyne! Notice how there is basically no ghosting, because I apparently suck at picking points. Good thing Computer Vision doesn't.

Original hand-matched Airbnb

Automatically aligned Airbnb! There is still ghosting on the lamp because it was really close to the camera and I suck at taking mosiac friendly photos, but the chair is much better.

Original hand-matched Freedom Park

Automatically aligned park in Charlotte! Notice how there is no ghosting on the branch. Fun fact: before I slightly downsampled the original images, I had ~32,000 Harris corners to run through ANMS (my bottleneck), which means the code would've taken 66h hours to finish. (With 5,000 corners it takes 2 minutes).

Bell & Whistle: Greening London

I found some photos from one of the many days I went to London when I studied abroad in Cambridge two summers ago. London has a lot of billboards/signs/adverts, but also a lot of parks. I put some green imagery on these billboards.


Fleet Street

Notice the half advert.

Flower gardens in Regents' Park

Fleet Street Gardens?

Regent Street Fashion Show?

But it would be cuter with a duck.

Duck in Regents' Park

Duck edited

In Photoshop so it looks more like a bad LCD screen.

Pay attention to the duck!

Bell & Whistle: Rotational Invariance

My descriptors are invariant to rotations, as you can tell from this mosiac of my sister's desk I constructed by attempting to rotate my camera between shots.


Bell & Whistle: Multi Perspective Panoramas

Inspired by the lecture in class, and since I'm home for Thanksgiving, I created my own multi perspective panorama/Hockney image of my kitten, Emmet. I stiched together 10 pictures; you can see the originals here.


Bell & Whistle: Data Driven Panoramas

Inspired by the data driven lectures, I mined Flickr for some photographs of Half Dome. The program managed to correctly detect and stich together three photographs from different times, to a cool effect.


Original here
Original here
Original here

Time changing Half Dome.

Cropped

Takeaways

While I didn't think Part A was super cool, I enjoyed part B more, especially since automatically aligned mosiacs were so much better than my hand-defined mosiacs. It made me trust in computer vision, and realize that a lot of feature detection/matching things are just a bit of clever math.