Increasing Light Sensitivity Of The Ikea Molgan Light

Recently I bought a Molgan light from Ikea for my bathroom. It’s a nice little battery-powered light with a motion sensor. It provides just the right amount light for going to bathroom without being woken up completely. However, there was one thing that had been bugging me ever since I got it. Despite having a light sensor to avoid turning on when it’s not dark, it has been consistently activating even when the bathroom light was on.

Apparently what I would consider a slight shadow Ikea considers dark. What probably doesn’t help either is the fact that I’m using LEDs in the bathroom. After some measuring (more on that later) it seems the spectrum of the LED might not match the Molgan’s sensitivity all that well. There begins my endeavor to increase the light sensitivity. Good thing is that the light is also quite cheap, so in case it breaks it doesn’t cost much to get another one.

_MG_0411Preparations.

Opening the light is straightforward. There are no screws holding it together. Instead, the upper translucent cover can be removed using a knife and a little bit of prying. The cover is king of tucked in a with a bit of glue (the kind of glue that is used on ducktape – sticky and rubbery, not a hard glue). It’s best to slowly move the cover by prying the knife around the cover until you get enough space to grab it using your nails. That way you decrease the risk of damaging the cover.

_MG_0412Almost open…

After the cover is removed, we can see the PCB with LEDs, a motion sensor and a light sensor. There is no access to the parts terminals from the visible side of the PCB. To achieve that, one has to desolder two connections to the battery holder. Apart from the two soldered contacts there is nothing else keeping the plastic packaging and the PCB together.

The insides. The soldering isn’t very clean.

At this point I didn’t know what the light sensor was, because I knew only about photoresistors and this obviously wasn’t one. Later I learned that the part is called phototransistor and it’s use is basically the same as photoresistor. Despite not knowing what kind of electronic part the light sensor was, I couldn’t be stopped from measuring it’s resistance depending on the light. Fortunately a phototransistor behaves the same way as a photoresistor so this approach worked. Measuring the resistance is straight forward. I used a small multimeter for that. It’s important not to forget to put the light cover over the light sensor during the measurements.

_MG_0422The futile attempt to measure the switching resistance while connected to home-made laboratory source.

First I measured the resistance in the bathroom where the light is located. I noticed an interesting thing – though the the bathroom is brighter when the lights are on than the shadow in the hallway, the resistance was about the same (maybe even a bit higher) in the bathroom. Probably because the spectrum of the LED lights doesn’t match the sensitivity of the light sensor well. The measured resistance in the bathroom was about 490 kΩ.

Next was to find the switching resistance. This took me more time than I anticipated. My original intention was to hook up the PCB to a laboratory power source and then just try various covers to find out when it switches. Unfortunately I was not able to cover it well enough to make it switch reproducibly. In the end I put everything together (except for the cover) and just moved with the light from light to shade until I found a place where it started switching on. I remembered the position, tore the light apart again and measured the light resistance when moving around this position. At that place the resistance quickly jumped from about 100 kΩ to 200 kΩ. I’m pretty sure the switching was closer to the 200 kΩ.

From the measurements it can be seen that what I needed was to get the light sensor to have resistance somewhere between 100-200 kΩ when it has 490 kΩ unmodified. To reduce the resistance, we need to use a resistor in parallel. Now is the time for elementary school physics and use a formula for parallel resistor connection.

The formula for two resistors is:
\frac{1}{R} = \frac{1}{R_1} + \frac{1}{R_2}

Let’s say the desired resistance is somewhere in-between:
R = 150 k\Omega

The resistance of the phototransistor is given. Lets round it up a bit to give us some space:
R_1 = 500 k\Omega

Solving for R_2 gives us:

R_2 \approx 214 k\Omega

Which is pretty close to a 220 kΩ resistor. The final step is to solder the selected 220 kΩ resistor parallel to the phototransistor and put everything together.

_MG_0423The final modification.

Now Molgan never lights up when the bathroom light is on, while it still reliably turns on when the lights are off.

I’m actually very happy with the Molgan after this simple hack. I don’t get woken up in case I have to go to the bathroom in the night. I also don’t have to recharge the batteries all the time. I think I have had to charge them almost as often as my phone.

First Big Milestone – First Image Produced

Just minutes ago I reached first big milestone in lyli development. I was able to produce my first low-resolution image from the RAW data.

After ditching OpenCV calibration and implementing my own calibration routines instead, I was able to get calibration data that are good enough for further tests. I will probably write about that at some point, but now the good news!

Today I tried to use the calibration data to undistort the image and then to construct a low-resolution image by taking the color in the center of each lens. This is basically an equivalent of showing the picture at the lens focal plane. The implementations is still kind of stupid and takes a lot of assumptions, but it works. And now that first image:

preview

Status Report #2 – New Approach to Preprocessing

All gifts are evenly distributed among the relatives, so I got some time to write a new blog. I really wish I could say I got camera calibration done, but unfortunately that’s not the case. However, there is still some interesting stuff going on.

I wanted to have the camera calibration done already, but then an unexpected challenge appeared. In one of my earlier reports I mentioned vignetting is a potential trouble maker. I hoped the problems caused by vignetting would be avoided by the preprocessing I was doing earlier. It turns out I was wrong. I was testing the preprocessing on a randomly picked calibration images, but when I executed the calibration routine on all calibration images provided by the Lytro camera, about a third of them failed miserably. And all of the failures were due to vignetting breaking the lens detection in image corners.

The first though was to remove vignetting but keep the algorithm, but how? The first idea that came to my mind was to use Discrete Fourier Transform(DFT). Partially because I already considered using DFT to find the grid itself as the grid is nicely periodical, which I later declined as too complicated (read: “I don’t remember much so I would have to relearn it”) due to changes in spacing every few lines.

I don’t know why, but quick Google didn’t return any results on the topic of using DFT to remove vignetting. I’m actually quite puzzled by that, as Fourier transform seems like an obvious solution to me. Let me show you a picture:

3d_sin_vignetting

On the XY axis is an image representing an image with vignetting. On top of it is a plot of function z = sin(sqrt(x^2+y^2)) which is something easily modeled. It can be seen that the vignetting nicely copies the function above. In the frequency domain, we can easily use a function similar to above to remove the vignetting. We just need to figure out what coefficients need to be changed and how.

After a little more experiments I found out something that I never realized before, but which is quite obvious. For the lens preprocessing, I’m interested only in the lenses and not any large-scale objects. So I may as well remove all low frequencies. And it works!

This is not entirely different from the previous approach. Basically what I was doing was to filter all frequencies except the highest ones using an edge detection algorithm. The rest was only cleaning up the result as edge detection. By removing only the lowest frequencies, I avoided the sensitiveness to any little 1-pixel changes.

The removal of very low frequencies works so well, that I could ditch all the complicated preprocessing code described in one of the previous posts and instead change it to a two step process:

  1. Remove very low frequencies

  2. Thresholding

To improve the results even more, I tweaked the code a bit so that not only vertical lines, but the horizontal lines, too, are detected using pixels from the image center that is less affected by lens deficiencies.

Experiment #1 – Status Report #1

It’s time for the first status report on how my experiment is going. It is actually going pretty well. So far, I added option to download all calibration files to the cli utility and imlemented interface to JSON metadata, that will be required to properly select what files to use for calibration.

Adding the option to download calibration images was really simple, as I already had most code already in place. What made this especially easy was the fact that I already had a file system abstraction that provides a list of files, where each file is a represented by an object that has a handy function to write it out to an output stream. That means adding support for downloading of calibration files was just a matter of using the correct list and download all files from that list 😉

What was way more fun was creating an interface to JSON metadata that uses C++ objects. Every image has its metadata stored using JSON in a separate .TXT file. As the automatic calibration required obtaining some of these metadata, it is useful to have a nice interface.

I already decided to use JsonCpp to read the JSON data. I liked it’s API, and it has a good user base, too. While the API is nice, it requires to address the exact position where the value is. It would be more useful to have the information accessible using C++ objects and member getters. That would also make it more future-proof, as in case different format pops up, I can only change the implementation of the classes, but the public interface could remain the same.

Well, that’s all nice, but how to implement it. Writing the classes manually is a long and tedious work. Instead, I took the path of automatically generating the interfaces based on a simple declarative language. Unfortunately, I didn’t find any nice generator that could do that for C++, so I had to write one myself. I took this as an opportunity to re-learn basics of Python, and wrote it entirely using python.

To separate the actual JSON structure from the class structure, I designed a simple (meaning good-enough for the job) declarative description of the interface that includes the addresses where to read the values from.

Take this short example, which is a reduced version of an actual file I used to describe the image metadata.

class master/picture/frameArray[0]/frame/metadata/image {
    int width
    int height
    class color {
        float gamma
    }
}

The first token on a line is a type of the object. Here it is class, int and float. The type defines the data type storing the value, so the value can be properly converted to a correct type when it is read from JSON. The class type is somewhat special, as it defines a C++ class that provides getters to the members it contains.

Second token is a path in the JSON file where the object can be found. Currently, the last portion in the path is used as the name of the object. The paths are always relative to the enclosing class.

The above description translates into a following C++ interface:

class Image {
    int getWidth() const;
    int getHeight() const;
    class Color {
        float getGamma() const;
    };
    Color getColor() const;
};
Image getImage() const;

Neat.

The generator produces ready-to-use header and source file for this interface. The generator itself is based on a simple recursive-descent parser. The parser generates a AST. The code generation is based on a visitor pattern. One visitor traverses the AST to generate header. Different visitor is used to generate source file.

The generator can be found in the tools subdirectory in Lyli sources. It is executed as follows:

python3 pycppjson.py "Lyli::Image::Metadata" metadata.txt

This will generate the C++ interface from the description in the metadata.txt file. The interface is provided by a Metadata class defined in Lyli::Image namespace. You can pick the metadata.txt file from the Lyli sources. The generated sources are here: header. /source

Experiment #1 – Integrating Camera Calibration

Next thing on my roadmap is to integrate the camera calibration into both GUI application and CLI utility.

The plan is that the CLI tool would provide two new options. First, an option to download all calibration data into a selected directory. Second, an option taking a directory with calibration images that would go through the images and that would generate a camera profile from the calibration results.

The GUI application should be more automated than that. My idea is that when a new, unknown camera is connected, the GUI would ask for confirmation of automatic calibration. If the user accepts it, the application will automatically download the required calibration files, run the calibration and store the camera profile in a common path for later reuse.

Apart from the obvious UI part this will require some other important things to be implemented.

To select which images to use for the calibration, it will be necessary to implement an interface to JSON metadata supplied with RAW images. I actually already wrote the interface, but I didn’t write the backend code that would fill it with data from JSON. I began writing the code, but I soon gave up. I don’t think this is the right way. Instead, I will probably create some simple declarative description of the interface and a generator to do this boring work for me.

Next thing to do is to decide how to store the camera profile. I didn’t though out this yet, but it’s probably going to be some XML-based format (because there are good libraries for processing XML). It will probably have store profiles for various camera settings – at least focal length, as it significantly affects camera distortions.

An Experiment

It took me a few days to implement camera calibration. It took me several weeks to start writing about it. And it took me a few more weeks to finish the writing. Because of that I decided to conduct an experiment upon myself. I will write what I’m about to implement before actually implementing it.

The idea is stupid enough so that it may actually work. I believe it will remove the attitude: “it’s in the code, so why bother writing in a human readable form.” I also believe it will help me to better think my ideas over.

Camera Calibration

It has been long time since I blogged about Lyli development. This is mostly because the development slowed down considerably due to lack of time/interest (seriously, who would want to code anything after spending 8-9 hours coding at work). Most of the time there was no real development, only minor tweaks and code reorganizations, except for one thing: the camera calibration. This is something I was really excited about, as this is one of places where there is a space for improvement compared to the Lytro Desktop. Or at least the version 3, which is the latest version to work on my old trusty notebook which still has dual-boot to Windows.

Why Does It Matter?

Usually when we talk about camera calibration, we mean a process of finding a transformation that corrects the deficiencies in the camera optical system. Lytro has an additional specific that makes calibration easier and more complicated at the same time. That specific is the separation of the image pixels into small clusters by the microlens array, one for each lens.

While the camera calibration can be seen as a purely optional step with ordinary cameras that only helps the image quality, it is an absolute necessity with Lytro. The reason is the said microlens array, as we need to know its layout before any image can be processed.

The upside of the microlens array presence is that it allows us to calibrate for the lens distortions without having to shoot a specific calibration pattern. Well, this is not entirely true, as we still need to detect the microlens array, meaning we have to use an image where it can be detected reliably.

Camera Metadata and Calibration

The most obvious way to obtain the microlens layout is to hardcode the layout and read the variable parameters from the metadata stored with every image. These metadata are in JSON format stored in a TXT file accompanying each RAW image (LFP files are basically the RAW + TXT glued into a single file).

The interesting portion of metadata reads:

...
"mla": {
    "tiling": "hexUniformRowMajor",
    "lensPitch": 0.00001399999999999999911182158,
    "rotation": 0.002147877356037497520446777344,
    "defectArray": [],
    "config": "com.lytro.mla.11",
    "scaleFactor": {
        "x": 1.0,
        "y": 1.00021874904632568359375
    },
    "sensorOffset": {
        "x": 0.000001216680407524108664674145,
        "y": -0.000001459128141403198419823184,
        "z": 0.000025
    }
}
...

This specifies the rotation of the microlens array [1], lens pitch and some offset that is likely to be offset of the array against the sensor. It even stores a “config” which I expect to be a reference to a hard-coded array layout to use. Knowing that the lenses are stored in a hex grid in combination with this knowledge should offer enough information to be able to reconstruct the whole microlens array.

So why not just stop here? Well, here’s the thing. First a mandatory picture;

raw

Did you notice anything about the lens grid in the image above? Even a quick glance at a RAW image reveals that the structure of microlenses is not uniform across the image. Some of the rows has larger space in between. That means using a simple hard-coded hex grid would lead to increasing errors as the distance from the upper left corner increases.

The solution is to use a non-uniform grid storing all grid coordinates. I suppose that’s what the “config” in the metadata is used for – they know about these shortcomings and the options selects the exact layout to use. But first, we need to detect the exact layout. While I could do that once, and hard-wire it into Lyli, I decided to always calibrate the camera. This way it can take any flaws introduced during production of that specific camera into consideration.

At this point, the lens calibration becomes vital to the process. While we already accepted the fact the grid is non-uniform, it doesn’t have to too non-uniform. The lens distortions, and especially the barrel distortion, results in a grid where the lines are slightly curved. When that is corrected, only the spacing becomes non-uniform.

Constructing the Lens Grid

Now it is the time to praise the decision to use OpenCV to represent image data. Otherwise I would have probably just left off, leaving everything indefinitely, as implementing tons of simple algorithms needed for the intermediate steps is soo boring.

In the following sections I’m going describe how the lens grid is detected and constructed. Everything is shown step by step, lots of pictures included.

Extracting Microlenses

The first step is to preprocess the image to extract microlenses for easier detection. I used an image of white screen as an input, similar to the calibration images stored in the camera (there’s no specific reason why I didn’t use the latter, as they are pretty much the same). The main point of using a purely white image is that the lenses are well distinguishable in such image and that the contrast is even across the image. So without further ado, lets get going!

First the image is converted to grayscale:

preprocess-step1

Then a Laplacian operator with 3×3 kernel is applied to detect edges between lenses and the image is thresholded using the images mean value. I actually tried more sophisticated approach of computing threshold using cumulative histogram, but it was not worth the work.

preprocess-step2

As the image contains small specks, we need to get rid of them. For that I came up with a simple morphological operator using the following structuring element:

0.125 0.125 0.125
0.125 0.0 0.125
0.125 0.125 0.125

After applying the morphological operator, a threshold on value 95 is applied. The idea here is simple – if the element at the center does not have at least three white neighbors, it gets value < 95 and it is removed by thresholding:

preprocess-step3

The filtered lenses are still often connected though. To avoid that, I use dilation two times and then one erosion, both using rectangular 3×3 structuring element. And here’s the result, every lens has its own separate image, represented by a white dot. At this point, I also invert the value of the image.

preprocess-step4

The resulting image is a 1-bit image that will be used as a mask for finding the center of each lens.

Finding Lens Centers

Now that we have the mask separating lenses from each other, we can continue with a detection of each lens center. To find it, a centroid of each lens image is computed. To achieve that, we use the dots as a mask for computing centroids in the grayscale image, ie. for each white pixel in a dot, we take the corresponding pixel in the grayscale image to compute the centroid.

The dots themselves are detected by line scanning. When the scanline hits the the topmost left pixel of a dot, the whole dot is discovered and the centroid is computed. To find all pixels of the dot, I came up with an algorithm similar to common fill algorithms such as bucket fill that works on monotone polygons. This is sufficient, as the dots are monotone with respect to the vertical axis y. The algorithm is iterative and it is simple to implement.

NOTE: for the future reference, I will use x as the horizontal axis and y as the vertical axis

The Algorithm

  1. Start at the topmost left pixel (green highlight) and scan to the right until the last pixel of the object is reached. Store the y-position of the leftmost and rightmost pixel. This interval will be used when processing the next line.
    centroid-detection-step1

  2. The pixels processed in the previous step are highlighted using gray color. The interval discovered in the previous step is highlighted in red. The algorithm moves to the next row, starting at the stored y-position of the left pixel (green). If the pixel at this position is part of the object, scan to the left to find the leftmost pixel and then to the right to find the rightmost pixel. Again, store the interval.
    centroid-detection-step2

  3. repeat, we start at the green pixel. This time the interval shrinks on the right side.
    centroid-detection-step3

  4. If the starting pixel is outside the object, we scan to the right until we find a pixel belonging to the object or until we hit the right border of the interval. In this case, the algorithm hits the object while being in the interval, so we process it, updating the interval again.
    centroid-detection-step4

  5. The algorithm stops when no pixel from the object is found within the specified interval.
    centroid-detection-step5

The following image shows the raw image with centers depicted as black pixels.

centers-detected

Refining the Centroids

It is possible that the use of a 1-bit mask reduced the precision of the centroid computation. The main reason is that some of the pixels that would otherwise be part of the lens image may have been omitted because the thresholding removed them. Taking that into consideration, I introduced a refining step. This is the most computationally intensive step, as it processes each lens several times with subpixel precision.

The Algorithm

The starting estimate of a lens center is a centroid computed in the previous step. A new estimate is computed as a centroid of a circular neighborhood of the current estimate. The algorithm begins by using a neighborhood of a three-pixel radius. This is repeated using a 1px larger neighborhood in each iteration until we obtain the estimate using a neighborhood with radius of six pixels.

The centroid is computed at subpixel precision, meaning that the position of neighboring pixels usually does not match the pixels in the original image. For that reason the value of neighboring pixels is interpolated. I chose to use bilinear interpolation, as it is both simple and fast.

And now a mandatory image showing the position of the initial estimate and the refined positions. The positions are rounded to the nearest pixel. The red pixels are the initial estimates, the blue pixels correspond to the refined positions. If both were to be shown on the same pixel, the pixel is black.

centers-refined

Connecting the Dots

The lens centers we obtained in the last step can be used to reconstruct the spatial information of the lens grid. The algorithm is essentially a sweep algorithm that adds the currently processed center to the closest line executed twice to create both vertical and horizontal lines, thus creating a grid. Although I describe it separately, the sweep nature of the algorithm makes it possible to combine it with the the lens center detection and refining into one step. In the actual implementation, as soon as a center is detected, the center is refined and added to the closest horizontal line.

The algorithm has two modes of operation. In the first one, new lines are created and points are added to these lines. The other one only adds points to the existing lines. The reason to separate the process into two steps is to avoid discontinuity of lines. It could happen that when an errorneous center is processed a line is interrupted in the middle of the image and a new line is created, breaking process for the following points close to this line.

The Algorithm #1

The algorithm sweeps a line across the image from left to right (well, in reality I transposed the image, so that I can sweep from top to bottom).

  1. In the first step, only a limited amount of columns is processed. The point of this step is to create list of horizontal lines. The algorithm sweeps an imaginary test line across the image from left to right. When the test line hits a first lens center it creates a new object representing a line (in fact the line is just a list of centers) and stores it in a map of lines. This map stores the y-position of the last detected center in each line (ie. the current rightmost point in the line) and maps it to the corresponding line. When a next center is hit by the test line, the map is checked for a line whose last point has the closest y position to the currently processed center. If the distance from the closest line is exceeds 3px (selected empirically) a new line is created. Otherwise the point is added to the closest line and the yposition in the map record is updated to correspond to the newly added point.

The procedure is illustrated in following images. The black crosses mark the lens centers, red line is the limit for the first step. Green dashed line is the test line. Blue lines are the detected lines of centers.

line-detection-1-step1
line-detection-1-step2
line-detection-1-step3
line-detection-1-step4

  1. The second step is nearly the same as the first step, with the only difference that if a point is too far from any line, it is ignored.

The first image shows detection of a point that is close enough to a line to be considered part of the line. The second image shows detection of an outlier.

line-detection-2-step1
line-detection-2-step2

Now we have first part of the spatial information reconstructed – the lens centers are connected into horizontal lines. The next step is to reconstruct the vertical lines.

The Algorithm #2

The vertical line detection is the same as the algorithm #1 for horizontal lines with two subtle differences. First, the x coordinates of centers in odd and even horizontal lines are displaced, so it is necessary to solve odd and even lines separately. The next difference is that we no longer process the centers from the image directly, but we rather use the horizontal lines as the source of points.

  1. Order the horizontal lines according to their y-position. Select six lines at the 1/4 of the horizontal line count. I chose to use these lines deliberately, as they should not suffer too much from any lens distortions. I addition, as the quality of lens center detection and line reconstruction depends a lot on the quality of the preprocessing, these lines are expected to have good quality because the threshold should work well at these coordinates.

  2. Use the selected lines to construct vertical lines in a similar fashion to the first step of the first algorithm run.

  3. Process the rest of the centers and add them to the corresponding lines. We need to be careful here, as the lines were not created starting at the image border, but instead in a 1/4 of the image height. We can continue to sweep from those 6 lines to bottom as we did in #1. However, to process the top quarter of the image, we have to update the map of lines to use the first point of each line to find the closest line. Also, the points are added to the beginning of the line rather than to the end.

Lastly, the points that are not in both vertical line and a horizontal line are removed.

Finally, we have a grid of points, where each intersection represents a lens center.

Now as for the results – first, a grid overlaid on the raw image:

lines-distorted-overlay

A grid of the whole image (warning, big image):

lines-distorted

Lens Calibration

There is a handy function calibrateCamera in OpenCV, that given a list of object points and a list of image points, computes the coefficients required to remove the lens distortions. In the OpenCV terminology, the object points are points with distortion and the image points are the desired point positions with distortions removed.

The grid of points now becomes very helpful. The grid points can be used directly as the object points. To obtain image points, I just used the middle third of the horizontal lines to average x-position of points on each vertical line, obtaining “average” vertical lines. And then the same thing to get “average” horizontal lines. Again, the decision which lines to use was a deliberate selection to avoid distortions. The points of the average lines are then fed into the calibrateCamera as the image points.

Following image illustrates the creation of desired vertical lines. A limited amount of horizontal lines that show the least distortion is selected (red). The points on these lines (red circles) are used to compute the “optimal” vertical lines (green).

lines-average

Voilà, we have parameters required to remove lens distortions. And this is how the grid looks after lens distortions removed (warning, big image):

lines-undistorted

References

[1] “Reverse Engineering the Lytro .LFP File Format”

Interesting Lytro resources

Recently I’ve been contacted by Jan Kučera (funny fact – apparently he is a student of a university in the city where I recently moved in) whether I saw his pages about Lytro – the LYTRO meldown web. I wish I knew about his pages when I got my camera, as that would lift the burden of actually having to decipher the protocol just to download a few files. For that reason I decided to compile a list of interesting resources about Lytro internals for everyone to find.

So here’s the list (it may be extended over time):

LYTRO meltdown – by far the most comprehensive resource on Lytro internals I know of, including a detailed description of the protocol, various files and the hardware. Recommended reading.

LightField Forum – a great starting point for anyone interested in the lightfieltd photography, including Lytro.

Todor Georgiev‘s web – one of the few persons who, together with Yu, Z., Yu, J., and Lumsdaine, tackled the problem of demosaicing with lightfield cameras. This is a very important problem currently holding me off from more development. Another fun fact – one of the figures is a photography of one of my university professors.

eclecticc – especially the post Reverse Engineering the Lytro .LFP File Format helped me with decoding the lytro RAW files. The author is also the creator of lfptools.

code.behnam.es – a collection of python scripts to read and view Lytro LFP files. I’ve not tested them as Lyli doesn’t support LFP output yet.

Lytro Academic Papers – list of various papers on lightfield photography, including the Ren Ng’s thesis.

Lytro Protocol

To make the camera useful it is first necessary to be able to download images from it. With Lytro camera this is not so simple, because Lytro camera doesn’t use any standard protocol. TLDR: skip to Protocol Description

At a first sight it may look like the downloading of pictures from the camera will be a piece of cake. When camera is connected to a computer, the system immediately detects it and shows a device. But it’s detected as a CD-ROM, containing a single HTML file referring to the Lytro website to download Lytro Desktop, an application that actually allows downloading of the pictures from the camera to the computer.

So lets proceed to examining the protocol used by Lytro Desktop to download images. Fortunately I already had some experience with reverse-engineering USB protocol when working on eilin, so this was relatively easy.

First we need to capture the USB traffic during download of images. There are multitude of ways tools to achieve that, and I tried many of them. There are two that always led to good results. My favorite option is installing the application inside virtual machine with Windows and then capture the traffic using Wireshark and the Linux module usbmon. Whats great about this is that it doesn’t require installation of any additional software, and the virtual machine can be thrown away at any time. The other option involves installing USBPcap in the windows and using that to capture data. The data captured by USBPcap can be easily loaded into Wireshark for examination. For examining the protocol of the Lytro camera I had to pick the USBPcap because of the Lytro Desktop requirements (64bit Windows 7 and basic 3D acceleration).

Reverse-engineering the Protocol

First I tried to determine whether it uses some known protocol. This part alone took me most of the time I spent working on the protocol. Based on the USB interface descriptor it should be bInterfaceClass 0x08 (Mass Storage) and bInterfaceSubclass is 0x05. This subclass is now obsolete but it once meant SFF-8070i, which was apparently used for USB floppy drives. But it isn’t one. Next I used some of the used SCSI commands. Based on that, Lytro doesn’t use any known protocol. Or at least my google-fu didn’t return any useful results. That meant I had to reverse-engineer the protocol myself. Yay! 😦

This is how the capture looks in Wireshark after applying some filters so only the communication with the camera is shown:

Wireshark with captured data
Wireshark with captured data

Doesn’t look very clear, does it? It looks like some state-based protocol, but it’s not obvious how the transitions between states work. So lets employ yEd to visualize state changes:

Capture visualised using yEd
Capture visualised using yEd

I know the image looks ugly, but the succession of steps that happens after connecting the camera and using the Lytro Desktop to download images is much cleaner now. Using the capture as a reference, it can be deduced that 0xC2 00 01 ... commands select the file to download, then a C6 ... command is called and finally C4 ... command is called to obtain the file data (with RAW data C4 is called repeatedly). There are some commands that stands out in not following this succession of steps completely. One of them is actually essential for being able to download images. Its the “0xC2 00 02” command, which returns list of files with some metadata.

At that point, the C6 command was still a bit of an enigma, because it didn’t seem to have any effect on what was sent next. When I started to become desperate in figuring out what it does I decided to try to match it against anything in the reply. And voilà, it was the response length. The one thing I don’t understand here is why protocol provides a way to query the response length, but it doesn’t use it when requesting data.

This is all what is needed to download images from Lytro! There are some more commands in the capture that I don’t understand, however as they are not required for downloading images, I didn’t want to spend time trying to decode them.

Implementation

Lytro camera turns out to be very unfriendly when it comes to experimenting with its protocol. Even a minor mistake when sending commands results in camera freeze. This can be fixed either be disconnecting and reconnecting the cable, or using software by clearing halt on the endpoint and then doing full USB reset.

Protocol Description

In this section I will provide description of the selected commands used to communicate with the Lytro camera. There are more commands in the Lytro protocol, however I took time to identify only the ones that are required to download files from the camera. I suppose the not-identified commands may be used to query additional camera information as the Lytro Desktop showed some.

The protocol uses standard USB mass storage bulk transfer with custom 12-byte SCSI commands. This means the commands are in the command block of the mass storage Command Block Wrapper and the status is returned using the Command Status Wrapper.

Command Block Wrapper (CBW)

4B 4B 4B 1B 1B 1B 12B
4b 4b 3b 5b
Signature Tag DataTransferLength Flags Reserved (0) LUN Reserved (0) Length CB

Where:

  • Signature – signature identifying packet as CBW, string “USBC”
  • Tag – generated id of the command
  • DataTransferLength – expected length of the data returned by the device
  • Flags – last bit is direction (0 – host to device, 1 – device to host in byte)
  • LUN – Logical Unit Number, always zero for Lytro commands
  • Length – length of the command block, always 12 for Lytro commands
  • CB – command block, ie. an actual SCSI command

Command Status Wrapper (CSW)

4B 4B 4B 1B
dCSWSignature dCSWTag dCSWDataResidue bCSWStatus

Where:

  • dCBWSignature – signature identifying packet as CSW, string “USBS”
  • dCSWTag – tag of the corresponding CBW
  • dCSWDataResidue – difference between the requested amount of data and the returned amount
  • bCSWStatus – status, 0 is success, anything else is essentially a failure.

Command List

SCSI Inquiry

the device replies to a standard SCSI inquiry, there are many sources for that, such as [1],[2]

Query File List

dCBWDataTransferLength 0
bmCBWFlags 0
CBWCB 0xc2 00 02 00 00 00 00 00 00 00 00 00

Returns:

84B 124B 124B
??? file description file description
Where the file description is:
20B 4B 24B 48B
??? file id ??? SHA1 time

Select File

dCBWDataTransferLength file name length
bmCBWFlags 0
CBWCB 0xc2 00 01 00 00 00 00 00 00 00 00 00

followed by a bulk transfer containing the file name.

Return Selected File Size

dCBWDataTransferLength 65536
bmCBWFlags 0x80
CBWCB 0xc6 00 00 00 00 00 00 00 00 00 00 00

Returns:

file size in bytes

Read Selected File Data

dCBWDataTransferLength 65536
bmCBWFlags 0x80
CBWCB 0xc4 00 01 00 00 **XX** 00 00 00 00 00 00

Where XX is an iterator starting from 0.

Returns:

Data as bytes. The end of data can be determined by reading less than 65536 bytes.

Interesting Files

The camera provides access to more than just RAW image data. In fact, each picture spans three files. There are also other files, but they are not very important. To download a file, the file has to be selected using the query file command followed by the download data command.

Image data

I:\DCIM\100PHOTO\IMG_XXXX.RAW

  • XXXX is zero-padded id of the image
  • contains the RAW image data (without any metadata)
  • the data are stored as 12bit in big-endian order
  • the image uses bayer filter

I:\DCIM\100PHOTO\IMG_XXXX.TXT

  • XXXX is zero-padded id of the image
  • accompanies the .RAW file
  • contains detailed information about the camera and the picture itself
  • JSON format

I:\DCIM\100PHOTO\IMG_XXXX.128

  • XXXX is zero-padded id of the image
  • image thumbnail
  • stored as 16bit 128×128 grayscale image

Other files

A:\FIRMWARE.TXT

  • contains information about camera firmware
  • JSON format

A:\VCM.TXT

  • some internal info about firmware versions
  • JSON format

Calibration data

  • according to [3], the camera provides also calibration data
  • I didn’t research how to obtain them yet, but I may look into them in the future to in order to support automatic calibration.

Downloading Files

It should be noted that the camera is very sensitive to the order of commands and to the command correctness.
If the commands are sent in wrong order or in wrong format it results in halting the device, which can be fixed only by a full USB reset and USB endpoint clear halt or by physically reconnecting the device.

The general order of commands is:

  1. select file (SCSI command + bulk transfer containing the file name)
  2. get the request status
  3. get the expected length (optional)
  4. read data until as long as the device is returning some

References

[1] TLDP SCSI Programming HOWTO

[2] SCSI Commands Reference Manual – Seagate

[3] Todor Georgiev. Keynote presentation at Electronic Imaging , Feb 05 2013.

Thoughts On Publishing Open Source Software

Earlier today I read an article about Xiaomi breaking GPL license. This made me think why I publish all the software I write in my spare time as open source.

People may have many reasons to release their software as open source. Many do it for the ideological reasons. But I think it all comes down to the sense of appreciation and the feeling of giving something back. It’s nice to have a popular application. But it’s even greater feeling if someone likes your code so much that they decide to use it. The other reason – the feeling of giving something back is probably deeply rooted in our society based on exchange of goods. If you give me something, I will give you something in exchange. And vice versa.

Lots of people may not realize it, but without open source the computing wouldn’t be what it is today. And I don’t mean only the popular projects such as Mozilla Firefox, LibreOffice or Linux. The hacker culture that began forming sometime in the 60s in academia (MIT)  where computer enthusiasts exchanged their knowledge and build upon the knowledge of others allowed for a greater development in computing and it produced many great programmers and computer scientists. The ghost of open source is constantly floating around. Publishing code as open source may not help eg. GCC developers directly. But it may help someone else to create something awesome that even the godly GCC developers will enjoy 🙂

I publish most of my code under LGPL and possibly GPL. Although the Free Software Foundation tries to convince people that LGPL is something inferior to GPL by calling it “Lesser GPL” (it used to be Library General Public License) I like the fact that it gives the users even more freedom. If you wanted to use GPL software in your application, you would have to publish it under GPL, too no matter how much code is yours. But with LGPL you can publish it under any license you want. But there’s one thing – I want you to give me something back in exchange for my code. If you don’t just use it, but if you improve it, I want everyone (including the selfish me) to have access to these improvements.