First Big Milestone – First Image Produced

Just minutes ago I reached first big milestone in lyli development. I was able to produce my first low-resolution image from the RAW data.

After ditching OpenCV calibration and implementing my own calibration routines instead, I was able to get calibration data that are good enough for further tests. I will probably write about that at some point, but now the good news!

Today I tried to use the calibration data to undistort the image and then to construct a low-resolution image by taking the color in the center of each lens. This is basically an equivalent of showing the picture at the lens focal plane. The implementations is still kind of stupid and takes a lot of assumptions, but it works. And now that first image:


Status Report #2 – New Approach to Preprocessing

All gifts are evenly distributed among the relatives, so I got some time to write a new blog. I really wish I could say I got camera calibration done, but unfortunately that’s not the case. However, there is still some interesting stuff going on.

I wanted to have the camera calibration done already, but then an unexpected challenge appeared. In one of my earlier reports I mentioned vignetting is a potential trouble maker. I hoped the problems caused by vignetting would be avoided by the preprocessing I was doing earlier. It turns out I was wrong. I was testing the preprocessing on a randomly picked calibration images, but when I executed the calibration routine on all calibration images provided by the Lytro camera, about a third of them failed miserably. And all of the failures were due to vignetting breaking the lens detection in image corners.

The first though was to remove vignetting but keep the algorithm, but how? The first idea that came to my mind was to use Discrete Fourier Transform(DFT). Partially because I already considered using DFT to find the grid itself as the grid is nicely periodical, which I later declined as too complicated (read: “I don’t remember much so I would have to relearn it”) due to changes in spacing every few lines.

I don’t know why, but quick Google didn’t return any results on the topic of using DFT to remove vignetting. I’m actually quite puzzled by that, as Fourier transform seems like an obvious solution to me. Let me show you a picture:


On the XY axis is an image representing an image with vignetting. On top of it is a plot of function z = sin(sqrt(x^2+y^2)) which is something easily modeled. It can be seen that the vignetting nicely copies the function above. In the frequency domain, we can easily use a function similar to above to remove the vignetting. We just need to figure out what coefficients need to be changed and how.

After a little more experiments I found out something that I never realized before, but which is quite obvious. For the lens preprocessing, I’m interested only in the lenses and not any large-scale objects. So I may as well remove all low frequencies. And it works!

This is not entirely different from the previous approach. Basically what I was doing was to filter all frequencies except the highest ones using an edge detection algorithm. The rest was only cleaning up the result as edge detection. By removing only the lowest frequencies, I avoided the sensitiveness to any little 1-pixel changes.

The removal of very low frequencies works so well, that I could ditch all the complicated preprocessing code described in one of the previous posts and instead change it to a two step process:

  1. Remove very low frequencies

  2. Thresholding

To improve the results even more, I tweaked the code a bit so that not only vertical lines, but the horizontal lines, too, are detected using pixels from the image center that is less affected by lens deficiencies.

Experiment #1 – Status Report #1

It’s time for the first status report on how my experiment is going. It is actually going pretty well. So far, I added option to download all calibration files to the cli utility and imlemented interface to JSON metadata, that will be required to properly select what files to use for calibration.

Adding the option to download calibration images was really simple, as I already had most code already in place. What made this especially easy was the fact that I already had a file system abstraction that provides a list of files, where each file is a represented by an object that has a handy function to write it out to an output stream. That means adding support for downloading of calibration files was just a matter of using the correct list and download all files from that list😉

What was way more fun was creating an interface to JSON metadata that uses C++ objects. Every image has its metadata stored using JSON in a separate .TXT file. As the automatic calibration required obtaining some of these metadata, it is useful to have a nice interface.

I already decided to use JsonCpp to read the JSON data. I liked it’s API, and it has a good user base, too. While the API is nice, it requires to address the exact position where the value is. It would be more useful to have the information accessible using C++ objects and member getters. That would also make it more future-proof, as in case different format pops up, I can only change the implementation of the classes, but the public interface could remain the same.

Well, that’s all nice, but how to implement it. Writing the classes manually is a long and tedious work. Instead, I took the path of automatically generating the interfaces based on a simple declarative language. Unfortunately, I didn’t find any nice generator that could do that for C++, so I had to write one myself. I took this as an opportunity to re-learn basics of Python, and wrote it entirely using python.

To separate the actual JSON structure from the class structure, I designed a simple (meaning good-enough for the job) declarative description of the interface that includes the addresses where to read the values from.

Take this short example, which is a reduced version of an actual file I used to describe the image metadata.

class master/picture/frameArray[0]/frame/metadata/image {
    int width
    int height
    class color {
        float gamma

The first token on a line is a type of the object. Here it is class, int and float. The type defines the data type storing the value, so the value can be properly converted to a correct type when it is read from JSON. The class type is somewhat special, as it defines a C++ class that provides getters to the members it contains.

Second token is a path in the JSON file where the object can be found. Currently, the last portion in the path is used as the name of the object. The paths are always relative to the enclosing class.

The above description translates into a following C++ interface:

class Image {
    int getWidth() const;
    int getHeight() const;
    class Color {
        float getGamma() const;
    Color getColor() const;
Image getImage() const;


The generator produces ready-to-use header and source file for this interface. The generator itself is based on a simple recursive-descent parser. The parser generates a AST. The code generation is based on a visitor pattern. One visitor traverses the AST to generate header. Different visitor is used to generate source file.

The generator can be found in the tools subdirectory in Lyli sources. It is executed as follows:

python3 "Lyli::Image::Metadata" metadata.txt

This will generate the C++ interface from the description in the metadata.txt file. The interface is provided by a Metadata class defined in Lyli::Image namespace. You can pick the metadata.txt file from the Lyli sources. The generated sources are here: header. /source

Experiment #1 – Integrating Camera Calibration

Next thing on my roadmap is to integrate the camera calibration into both GUI application and CLI utility.

The plan is that the CLI tool would provide two new options. First, an option to download all calibration data into a selected directory. Second, an option taking a directory with calibration images that would go through the images and that would generate a camera profile from the calibration results.

The GUI application should be more automated than that. My idea is that when a new, unknown camera is connected, the GUI would ask for confirmation of automatic calibration. If the user accepts it, the application will automatically download the required calibration files, run the calibration and store the camera profile in a common path for later reuse.

Apart from the obvious UI part this will require some other important things to be implemented.

To select which images to use for the calibration, it will be necessary to implement an interface to JSON metadata supplied with RAW images. I actually already wrote the interface, but I didn’t write the backend code that would fill it with data from JSON. I began writing the code, but I soon gave up. I don’t think this is the right way. Instead, I will probably create some simple declarative description of the interface and a generator to do this boring work for me.

Next thing to do is to decide how to store the camera profile. I didn’t though out this yet, but it’s probably going to be some XML-based format (because there are good libraries for processing XML). It will probably have store profiles for various camera settings – at least focal length, as it significantly affects camera distortions.

An Experiment

It took me a few days to implement camera calibration. It took me several weeks to start writing about it. And it took me a few more weeks to finish the writing. Because of that I decided to conduct an experiment upon myself. I will write what I’m about to implement before actually implementing it.

The idea is stupid enough so that it may actually work. I believe it will remove the attitude: “it’s in the code, so why bother writing in a human readable form.” I also believe it will help me to better think my ideas over.

Camera Calibration

It has been long time since I blogged about Lyli development. This is mostly because the development slowed down considerably due to lack of time/interest (seriously, who would want to code anything after spending 8-9 hours coding at work). Most of the time there was no real development, only minor tweaks and code reorganizations, except for one thing: the camera calibration. This is something I was really excited about, as this is one of places where there is a space for improvement compared to the Lytro Desktop. Or at least the version 3, which is the latest version to work on my old trusty notebook which still has dual-boot to Windows.

Why Does It Matter?

Usually when we talk about camera calibration, we mean a process of finding a transformation that corrects the deficiencies in the camera optical system. Lytro has an additional specific that makes calibration easier and more complicated at the same time. That specific is the separation of the image pixels into small clusters by the microlens array, one for each lens.

While the camera calibration can be seen as a purely optional step with ordinary cameras that only helps the image quality, it is an absolute necessity with Lytro. The reason is the said microlens array, as we need to know its layout before any image can be processed.

The upside of the microlens array presence is that it allows us to calibrate for the lens distortions without having to shoot a specific calibration pattern. Well, this is not entirely true, as we still need to detect the microlens array, meaning we have to use an image where it can be detected reliably.

Camera Metadata and Calibration

The most obvious way to obtain the microlens layout is to hardcode the layout and read the variable parameters from the metadata stored with every image. These metadata are in JSON format stored in a TXT file accompanying each RAW image (LFP files are basically the RAW + TXT glued into a single file).

The interesting portion of metadata reads:

"mla": {
    "tiling": "hexUniformRowMajor",
    "lensPitch": 0.00001399999999999999911182158,
    "rotation": 0.002147877356037497520446777344,
    "defectArray": [],
    "config": "com.lytro.mla.11",
    "scaleFactor": {
        "x": 1.0,
        "y": 1.00021874904632568359375
    "sensorOffset": {
        "x": 0.000001216680407524108664674145,
        "y": -0.000001459128141403198419823184,
        "z": 0.000025

This specifies the rotation of the microlens array [1], lens pitch and some offset that is likely to be offset of the array against the sensor. It even stores a “config” which I expect to be a reference to a hard-coded array layout to use. Knowing that the lenses are stored in a hex grid in combination with this knowledge should offer enough information to be able to reconstruct the whole microlens array.

So why not just stop here? Well, here’s the thing. First a mandatory picture;


Did you notice anything about the lens grid in the image above? Even a quick glance at a RAW image reveals that the structure of microlenses is not uniform across the image. Some of the rows has larger space in between. That means using a simple hard-coded hex grid would lead to increasing errors as the distance from the upper left corner increases.

The solution is to use a non-uniform grid storing all grid coordinates. I suppose that’s what the “config” in the metadata is used for – they know about these shortcomings and the options selects the exact layout to use. But first, we need to detect the exact layout. While I could do that once, and hard-wire it into Lyli, I decided to always calibrate the camera. This way it can take any flaws introduced during production of that specific camera into consideration.

At this point, the lens calibration becomes vital to the process. While we already accepted the fact the grid is non-uniform, it doesn’t have to too non-uniform. The lens distortions, and especially the barrel distortion, results in a grid where the lines are slightly curved. When that is corrected, only the spacing becomes non-uniform.

Constructing the Lens Grid

Now it is the time to praise the decision to use OpenCV to represent image data. Otherwise I would have probably just left off, leaving everything indefinitely, as implementing tons of simple algorithms needed for the intermediate steps is soo boring.

In the following sections I’m going describe how the lens grid is detected and constructed. Everything is shown step by step, lots of pictures included.

Extracting Microlenses

The first step is to preprocess the image to extract microlenses for easier detection. I used an image of white screen as an input, similar to the calibration images stored in the camera (there’s no specific reason why I didn’t use the latter, as they are pretty much the same). The main point of using a purely white image is that the lenses are well distinguishable in such image and that the contrast is even across the image. So without further ado, lets get going!

First the image is converted to grayscale:


Then a Laplacian operator with 3×3 kernel is applied to detect edges between lenses and the image is thresholded using the images mean value. I actually tried more sophisticated approach of computing threshold using cumulative histogram, but it was not worth the work.


As the image contains small specks, we need to get rid of them. For that I came up with a simple morphological operator using the following structuring element:

0.125 0.125 0.125
0.125 0.0 0.125
0.125 0.125 0.125

After applying the morphological operator, a threshold on value 95 is applied. The idea here is simple – if the element at the center does not have at least three white neighbors, it gets value < 95 and it is removed by thresholding:


The filtered lenses are still often connected though. To avoid that, I use dilation two times and then one erosion, both using rectangular 3×3 structuring element. And here’s the result, every lens has its own separate image, represented by a white dot. At this point, I also invert the value of the image.


The resulting image is a 1-bit image that will be used as a mask for finding the center of each lens.

Finding Lens Centers

Now that we have the mask separating lenses from each other, we can continue with a detection of each lens center. To find it, a centroid of each lens image is computed. To achieve that, we use the dots as a mask for computing centroids in the grayscale image, ie. for each white pixel in a dot, we take the corresponding pixel in the grayscale image to compute the centroid.

The dots themselves are detected by line scanning. When the scanline hits the the topmost left pixel of a dot, the whole dot is discovered and the centroid is computed. To find all pixels of the dot, I came up with an algorithm similar to common fill algorithms such as bucket fill that works on monotone polygons. This is sufficient, as the dots are monotone with respect to the vertical axis y. The algorithm is iterative and it is simple to implement.

NOTE: for the future reference, I will use x as the horizontal axis and y as the vertical axis

The Algorithm

  1. Start at the topmost left pixel (green highlight) and scan to the right until the last pixel of the object is reached. Store the y-position of the leftmost and rightmost pixel. This interval will be used when processing the next line.

  2. The pixels processed in the previous step are highlighted using gray color. The interval discovered in the previous step is highlighted in red. The algorithm moves to the next row, starting at the stored y-position of the left pixel (green). If the pixel at this position is part of the object, scan to the left to find the leftmost pixel and then to the right to find the rightmost pixel. Again, store the interval.

  3. repeat, we start at the green pixel. This time the interval shrinks on the right side.

  4. If the starting pixel is outside the object, we scan to the right until we find a pixel belonging to the object or until we hit the right border of the interval. In this case, the algorithm hits the object while being in the interval, so we process it, updating the interval again.

  5. The algorithm stops when no pixel from the object is found within the specified interval.

The following image shows the raw image with centers depicted as black pixels.


Refining the Centroids

It is possible that the use of a 1-bit mask reduced the precision of the centroid computation. The main reason is that some of the pixels that would otherwise be part of the lens image may have been omitted because the thresholding removed them. Taking that into consideration, I introduced a refining step. This is the most computationally intensive step, as it processes each lens several times with subpixel precision.

The Algorithm

The starting estimate of a lens center is a centroid computed in the previous step. A new estimate is computed as a centroid of a circular neighborhood of the current estimate. The algorithm begins by using a neighborhood of a three-pixel radius. This is repeated using a 1px larger neighborhood in each iteration until we obtain the estimate using a neighborhood with radius of six pixels.

The centroid is computed at subpixel precision, meaning that the position of neighboring pixels usually does not match the pixels in the original image. For that reason the value of neighboring pixels is interpolated. I chose to use bilinear interpolation, as it is both simple and fast.

And now a mandatory image showing the position of the initial estimate and the refined positions. The positions are rounded to the nearest pixel. The red pixels are the initial estimates, the blue pixels correspond to the refined positions. If both were to be shown on the same pixel, the pixel is black.


Connecting the Dots

The lens centers we obtained in the last step can be used to reconstruct the spatial information of the lens grid. The algorithm is essentially a sweep algorithm that adds the currently processed center to the closest line executed twice to create both vertical and horizontal lines, thus creating a grid. Although I describe it separately, the sweep nature of the algorithm makes it possible to combine it with the the lens center detection and refining into one step. In the actual implementation, as soon as a center is detected, the center is refined and added to the closest horizontal line.

The algorithm has two modes of operation. In the first one, new lines are created and points are added to these lines. The other one only adds points to the existing lines. The reason to separate the process into two steps is to avoid discontinuity of lines. It could happen that when an errorneous center is processed a line is interrupted in the middle of the image and a new line is created, breaking process for the following points close to this line.

The Algorithm #1

The algorithm sweeps a line across the image from left to right (well, in reality I transposed the image, so that I can sweep from top to bottom).

  1. In the first step, only a limited amount of columns is processed. The point of this step is to create list of horizontal lines. The algorithm sweeps an imaginary test line across the image from left to right. When the test line hits a first lens center it creates a new object representing a line (in fact the line is just a list of centers) and stores it in a map of lines. This map stores the y-position of the last detected center in each line (ie. the current rightmost point in the line) and maps it to the corresponding line. When a next center is hit by the test line, the map is checked for a line whose last point has the closest y position to the currently processed center. If the distance from the closest line is exceeds 3px (selected empirically) a new line is created. Otherwise the point is added to the closest line and the yposition in the map record is updated to correspond to the newly added point.

The procedure is illustrated in following images. The black crosses mark the lens centers, red line is the limit for the first step. Green dashed line is the test line. Blue lines are the detected lines of centers.


  1. The second step is nearly the same as the first step, with the only difference that if a point is too far from any line, it is ignored.

The first image shows detection of a point that is close enough to a line to be considered part of the line. The second image shows detection of an outlier.


Now we have first part of the spatial information reconstructed – the lens centers are connected into horizontal lines. The next step is to reconstruct the vertical lines.

The Algorithm #2

The vertical line detection is the same as the algorithm #1 for horizontal lines with two subtle differences. First, the x coordinates of centers in odd and even horizontal lines are displaced, so it is necessary to solve odd and even lines separately. The next difference is that we no longer process the centers from the image directly, but we rather use the horizontal lines as the source of points.

  1. Order the horizontal lines according to their y-position. Select six lines at the 1/4 of the horizontal line count. I chose to use these lines deliberately, as they should not suffer too much from any lens distortions. I addition, as the quality of lens center detection and line reconstruction depends a lot on the quality of the preprocessing, these lines are expected to have good quality because the threshold should work well at these coordinates.

  2. Use the selected lines to construct vertical lines in a similar fashion to the first step of the first algorithm run.

  3. Process the rest of the centers and add them to the corresponding lines. We need to be careful here, as the lines were not created starting at the image border, but instead in a 1/4 of the image height. We can continue to sweep from those 6 lines to bottom as we did in #1. However, to process the top quarter of the image, we have to update the map of lines to use the first point of each line to find the closest line. Also, the points are added to the beginning of the line rather than to the end.

Lastly, the points that are not in both vertical line and a horizontal line are removed.

Finally, we have a grid of points, where each intersection represents a lens center.

Now as for the results – first, a grid overlaid on the raw image:


A grid of the whole image (warning, big image):


Lens Calibration

There is a handy function calibrateCamera in OpenCV, that given a list of object points and a list of image points, computes the coefficients required to remove the lens distortions. In the OpenCV terminology, the object points are points with distortion and the image points are the desired point positions with distortions removed.

The grid of points now becomes very helpful. The grid points can be used directly as the object points. To obtain image points, I just used the middle third of the horizontal lines to average x-position of points on each vertical line, obtaining “average” vertical lines. And then the same thing to get “average” horizontal lines. Again, the decision which lines to use was a deliberate selection to avoid distortions. The points of the average lines are then fed into the calibrateCamera as the image points.

Following image illustrates the creation of desired vertical lines. A limited amount of horizontal lines that show the least distortion is selected (red). The points on these lines (red circles) are used to compute the “optimal” vertical lines (green).


Voilà, we have parameters required to remove lens distortions. And this is how the grid looks after lens distortions removed (warning, big image):



[1] “Reverse Engineering the Lytro .LFP File Format”

Interesting Lytro resources

Recently I’ve been contacted by Jan Kučera (funny fact – apparently he is a student of a university in the city where I recently moved in) whether I saw his pages about Lytro – the LYTRO meldown web. I wish I knew about his pages when I got my camera, as that would lift the burden of actually having to decipher the protocol just to download a few files. For that reason I decided to compile a list of interesting resources about Lytro internals for everyone to find.

So here’s the list (it may be extended over time):

LYTRO meltdown – by far the most comprehensive resource on Lytro internals I know of, including a detailed description of the protocol, various files and the hardware. Recommended reading.

LightField Forum – a great starting point for anyone interested in the lightfieltd photography, including Lytro.

Todor Georgiev‘s web – one of the few persons who, together with Yu, Z., Yu, J., and Lumsdaine, tackled the problem of demosaicing with lightfield cameras. This is a very important problem currently holding me off from more development. Another fun fact – one of the figures is a photography of one of my university professors.

eclecticc – especially the post Reverse Engineering the Lytro .LFP File Format helped me with decoding the lytro RAW files. The author is also the creator of lfptools. – a collection of python scripts to read and view Lytro LFP files. I’ve not tested them as Lyli doesn’t support LFP output yet.

Lytro Academic Papers – list of various papers on lightfield photography, including the Ren Ng’s thesis.