It has been long time since I blogged about Lyli development. This is mostly because the development slowed down considerably due to lack of time/interest (seriously, who would want to code anything after spending 8-9 hours coding at work). Most of the time there was no real development, only minor tweaks and code reorganizations, except for one thing: the camera calibration. This is something I was really excited about, as this is one of places where there is a space for improvement compared to the Lytro Desktop. Or at least the version 3, which is the latest version to work on my old trusty notebook which still has dual-boot to Windows.
Why Does It Matter?
Usually when we talk about camera calibration, we mean a process of finding a transformation that corrects the deficiencies in the camera optical system. Lytro has an additional specific that makes calibration easier and more complicated at the same time. That specific is the separation of the image pixels into small clusters by the microlens array, one for each lens.
While the camera calibration can be seen as a purely optional step with ordinary cameras that only helps the image quality, it is an absolute necessity with Lytro. The reason is the said microlens array, as we need to know its layout before any image can be processed.
The upside of the microlens array presence is that it allows us to calibrate for the lens distortions without having to shoot a specific calibration pattern. Well, this is not entirely true, as we still need to detect the microlens array, meaning we have to use an image where it can be detected reliably.
Camera Metadata and Calibration
The most obvious way to obtain the microlens layout is to hardcode the layout and read the variable parameters from the metadata stored with every image. These metadata are in JSON format stored in a TXT file accompanying each RAW image (LFP files are basically the RAW + TXT glued into a single file).
The interesting portion of metadata reads:
This specifies the rotation of the microlens array , lens pitch and some offset that is likely to be offset of the array against the sensor. It even stores a “config” which I expect to be a reference to a hard-coded array layout to use. Knowing that the lenses are stored in a hex grid in combination with this knowledge should offer enough information to be able to reconstruct the whole microlens array.
So why not just stop here? Well, here’s the thing. First a mandatory picture;
Did you notice anything about the lens grid in the image above? Even a quick glance at a RAW image reveals that the structure of microlenses is not uniform across the image. Some of the rows has larger space in between. That means using a simple hard-coded hex grid would lead to increasing errors as the distance from the upper left corner increases.
The solution is to use a non-uniform grid storing all grid coordinates. I suppose that’s what the “config” in the metadata is used for – they know about these shortcomings and the options selects the exact layout to use. But first, we need to detect the exact layout. While I could do that once, and hard-wire it into Lyli, I decided to always calibrate the camera. This way it can take any flaws introduced during production of that specific camera into consideration.
At this point, the lens calibration becomes vital to the process. While we already accepted the fact the grid is non-uniform, it doesn’t have to too non-uniform. The lens distortions, and especially the barrel distortion, results in a grid where the lines are slightly curved. When that is corrected, only the spacing becomes non-uniform.
Constructing the Lens Grid
Now it is the time to praise the decision to use OpenCV to represent image data. Otherwise I would have probably just left off, leaving everything indefinitely, as implementing tons of simple algorithms needed for the intermediate steps is soo boring.
In the following sections I’m going describe how the lens grid is detected and constructed. Everything is shown step by step, lots of pictures included.
The first step is to preprocess the image to extract microlenses for easier detection. I used an image of white screen as an input, similar to the calibration images stored in the camera (there’s no specific reason why I didn’t use the latter, as they are pretty much the same). The main point of using a purely white image is that the lenses are well distinguishable in such image and that the contrast is even across the image. So without further ado, lets get going!
First the image is converted to grayscale:
Then a Laplacian operator with 3×3 kernel is applied to detect edges between lenses and the image is thresholded using the images mean value. I actually tried more sophisticated approach of computing threshold using cumulative histogram, but it was not worth the work.
As the image contains small specks, we need to get rid of them. For that I came up with a simple morphological operator using the following structuring element:
After applying the morphological operator, a threshold on value 95 is applied. The idea here is simple – if the element at the center does not have at least three white neighbors, it gets value < 95 and it is removed by thresholding:
The filtered lenses are still often connected though. To avoid that, I use dilation two times and then one erosion, both using rectangular 3×3 structuring element. And here’s the result, every lens has its own separate image, represented by a white dot. At this point, I also invert the value of the image.
The resulting image is a 1-bit image that will be used as a mask for finding the center of each lens.
Finding Lens Centers
Now that we have the mask separating lenses from each other, we can continue with a detection of each lens center. To find it, a centroid of each lens image is computed. To achieve that, we use the dots as a mask for computing centroids in the grayscale image, ie. for each white pixel in a dot, we take the corresponding pixel in the grayscale image to compute the centroid.
The dots themselves are detected by line scanning. When the scanline hits the the topmost left pixel of a dot, the whole dot is discovered and the centroid is computed. To find all pixels of the dot, I came up with an algorithm similar to common fill algorithms such as bucket fill that works on monotone polygons. This is sufficient, as the dots are monotone with respect to the vertical axis y. The algorithm is iterative and it is simple to implement.
NOTE: for the future reference, I will use x as the horizontal axis and y as the vertical axis
- Start at the topmost left pixel (green highlight) and scan to the right until the last pixel of the object is reached. Store the y-position of the leftmost and rightmost pixel. This interval will be used when processing the next line.
The pixels processed in the previous step are highlighted using gray color. The interval discovered in the previous step is highlighted in red. The algorithm moves to the next row, starting at the stored y-position of the left pixel (green). If the pixel at this position is part of the object, scan to the left to find the leftmost pixel and then to the right to find the rightmost pixel. Again, store the interval.
repeat, we start at the green pixel. This time the interval shrinks on the right side.
If the starting pixel is outside the object, we scan to the right until we find a pixel belonging to the object or until we hit the right border of the interval. In this case, the algorithm hits the object while being in the interval, so we process it, updating the interval again.
The algorithm stops when no pixel from the object is found within the specified interval.
The following image shows the raw image with centers depicted as black pixels.
Refining the Centroids
It is possible that the use of a 1-bit mask reduced the precision of the centroid computation. The main reason is that some of the pixels that would otherwise be part of the lens image may have been omitted because the thresholding removed them. Taking that into consideration, I introduced a refining step. This is the most computationally intensive step, as it processes each lens several times with subpixel precision.
The starting estimate of a lens center is a centroid computed in the previous step. A new estimate is computed as a centroid of a circular neighborhood of the current estimate. The algorithm begins by using a neighborhood of a three-pixel radius. This is repeated using a 1px larger neighborhood in each iteration until we obtain the estimate using a neighborhood with radius of six pixels.
The centroid is computed at subpixel precision, meaning that the position of neighboring pixels usually does not match the pixels in the original image. For that reason the value of neighboring pixels is interpolated. I chose to use bilinear interpolation, as it is both simple and fast.
And now a mandatory image showing the position of the initial estimate and the refined positions. The positions are rounded to the nearest pixel. The red pixels are the initial estimates, the blue pixels correspond to the refined positions. If both were to be shown on the same pixel, the pixel is black.
Connecting the Dots
The lens centers we obtained in the last step can be used to reconstruct the spatial information of the lens grid. The algorithm is essentially a sweep algorithm that adds the currently processed center to the closest line executed twice to create both vertical and horizontal lines, thus creating a grid. Although I describe it separately, the sweep nature of the algorithm makes it possible to combine it with the the lens center detection and refining into one step. In the actual implementation, as soon as a center is detected, the center is refined and added to the closest horizontal line.
The algorithm has two modes of operation. In the first one, new lines are created and points are added to these lines. The other one only adds points to the existing lines. The reason to separate the process into two steps is to avoid discontinuity of lines. It could happen that when an errorneous center is processed a line is interrupted in the middle of the image and a new line is created, breaking process for the following points close to this line.
The Algorithm #1
The algorithm sweeps a line across the image from left to right (well, in reality I transposed the image, so that I can sweep from top to bottom).
- In the first step, only a limited amount of columns is processed. The point of this step is to create list of horizontal lines. The algorithm sweeps an imaginary test line across the image from left to right. When the test line hits a first lens center it creates a new object representing a line (in fact the line is just a list of centers) and stores it in a map of lines. This map stores the y-position of the last detected center in each line (ie. the current rightmost point in the line) and maps it to the corresponding line. When a next center is hit by the test line, the map is checked for a line whose last point has the closest y position to the currently processed center. If the distance from the closest line is exceeds 3px (selected empirically) a new line is created. Otherwise the point is added to the closest line and the yposition in the map record is updated to correspond to the newly added point.
The procedure is illustrated in following images. The black crosses mark the lens centers, red line is the limit for the first step. Green dashed line is the test line. Blue lines are the detected lines of centers.
- The second step is nearly the same as the first step, with the only difference that if a point is too far from any line, it is ignored.
The first image shows detection of a point that is close enough to a line to be considered part of the line. The second image shows detection of an outlier.
Now we have first part of the spatial information reconstructed – the lens centers are connected into horizontal lines. The next step is to reconstruct the vertical lines.
The Algorithm #2
The vertical line detection is the same as the algorithm #1 for horizontal lines with two subtle differences. First, the x coordinates of centers in odd and even horizontal lines are displaced, so it is necessary to solve odd and even lines separately. The next difference is that we no longer process the centers from the image directly, but we rather use the horizontal lines as the source of points.
- Order the horizontal lines according to their y-position. Select six lines at the 1/4 of the horizontal line count. I chose to use these lines deliberately, as they should not suffer too much from any lens distortions. I addition, as the quality of lens center detection and line reconstruction depends a lot on the quality of the preprocessing, these lines are expected to have good quality because the threshold should work well at these coordinates.
Use the selected lines to construct vertical lines in a similar fashion to the first step of the first algorithm run.
Process the rest of the centers and add them to the corresponding lines. We need to be careful here, as the lines were not created starting at the image border, but instead in a 1/4 of the image height. We can continue to sweep from those 6 lines to bottom as we did in #1. However, to process the top quarter of the image, we have to update the map of lines to use the first point of each line to find the closest line. Also, the points are added to the beginning of the line rather than to the end.
Lastly, the points that are not in both vertical line and a horizontal line are removed.
Finally, we have a grid of points, where each intersection represents a lens center.
Now as for the results – first, a grid overlaid on the raw image:
A grid of the whole image (warning, big image):
There is a handy function calibrateCamera in OpenCV, that given a list of object points and a list of image points, computes the coefficients required to remove the lens distortions. In the OpenCV terminology, the object points are points with distortion and the image points are the desired point positions with distortions removed.
The grid of points now becomes very helpful. The grid points can be used directly as the object points. To obtain image points, I just used the middle third of the horizontal lines to average x-position of points on each vertical line, obtaining “average” vertical lines. And then the same thing to get “average” horizontal lines. Again, the decision which lines to use was a deliberate selection to avoid distortions. The points of the average lines are then fed into the calibrateCamera as the image points.
Following image illustrates the creation of desired vertical lines. A limited amount of horizontal lines that show the least distortion is selected (red). The points on these lines (red circles) are used to compute the “optimal” vertical lines (green).
Voilà, we have parameters required to remove lens distortions. And this is how the grid looks after lens distortions removed (warning, big image):
 “Reverse Engineering the Lytro .LFP File Format”