Image sensors: Theory

When analysing or just using image sensors, knowing a little bit of theory makes everything easier.

The model described here is simplified to point out how the key parameters of image sensors relate to each other. Partially it uses physical laws (e.g. photon noise) and partially just formulas to describe how current sensors work (e.g. sensor gain). New sensor technologies could change the latter. So, how is light converted to a digital image and what influences the image quality?

Pixel value

Pixel values do not directly correspond to a number of photons, but to the number of electrons (often denoted $\e^{-}$) moved by them. The quantum efficiency gives the ratio of photons that move an electron (often expressed as percentage). It depends on the wave length and image sensor data sheets usually contain a diagram showing the relationship and quote the maximum quantum efficiency and its wavelength.

There are two common units for pixel values: ADU (A/D converter unit) and DN (data number). The former made sense when the result was the raw A/D conversion result, but modern sensors use post processing, hence the second is more common and that's why I use it here as well.

$$\text"electrons" = \text"photons" · \text"quantum efficiency"$$

The pixel value depends on three values:

$\text"DN"_{\text"max"}$: The maximum A/D value to convert the number of electrons into a digital value
$\text"full well capacity"$: The maximum number of electrons a pixel can hold
$\text"gain"_{\text"sensor"}$: The pre A/D converter amplifier

In literature, this term is usually called conversion gain or just gain, and sometimes it refers to electrons and sometimes to brightness in DN. Do not confuse conversion gain with image sensor gain.

$$\text"gain"_\text"conversion" = \text"full well capacity" / { \text"gain"_\text"sensor" · \text"DN"_{\text"max"} }$$

$$\text"brightness" = \text"electrons" / \text"gain"_\text"conversion"$$

The conversion gain where one DN equals one electron is called unity gain. Neglecting noise, image sensors can't measure light more precise than that. Unity gain does not refer to image sensor gain factor $1.0$ or ISO 100, but means a 1:1 mapping between electrons and brightness values. This is typically reached at rather high image sensor gain settings or ISO numbers.

In the real world, the pixel value also contains variable and fixed noise.

Photon noise

There is no such thing as a constant amount of light or a constant flow of photons. Instead by nature their arrival shows a variance. Try to count driving cars on a street: The measured average rate will be more precise with increased intervals, that is the differences of car counts will not grow as fast as the interval length, but they do grow. Photon noise is just that difference, because the brightness of a pixel is not the average count, but the actual count. Photon noise is also called shot noise.

Physics dictates that the amount of photons arriving in a certain time follows Poisson distribution. That means the square of the standard deviation is equal to the mean value of the number of photons, or put the other way round, the photon noise is the square root of the number of photons:

$$\text"noise"_{\text"photons"} = √{\text"photons"}$$

How does that translate to the photon noise measured in electrons? Dividing a Poisson distribution by a number (thinning it) again results in a Poisson distribution. The number of arriving photons is reduced by the quantum efficiency, so the number of moved electrons is also described by a Poisson distribution:

$$\text"noise"_{\text"electrons"} = √{\text"electrons"}$$

The same is valid for the noise measured in DN, but a DN is not as discrete as it looks at first, because the image sensor gain can scale it. The discrete nature of electrons is the true limit, so it makes sense to express the noise in electrons:

$$\text"noise"_{\text"brightness"} = { √{\text"electrons"} } / \text"gain"_\text"conversion"$$

Conclusion 1: Doubling the exposure means twice as many photons and $√2$ as much photon noise, so the image is not twice as good. Using a four times longer exposure results in a SNR (pixel value divided by noise value) being twice as good.

Conclusion 2: In terms of cars on a street: A more efficient imager does not miss as many cars while counting, so it shows a better ratio of pixel values and photon noise. A small lens opening is like having less traffic: The counting errors increase and the measured rate is less constant. Since imagers show a different sensitivity for different wavelengths, noise will be different for different colours.

Conclusion 3: As todays colour imagers are monochrome imagers with colour filters, colour imagers will always show more noise, because the filters reduce the amount of light getting to the pixels. There is a tradeoff between brightness noise and colour noise: More transmissive filters result in less brightness noise, but more colour noise, because photons may pass the wrong filter.

Conclusion 4: If you do not exceed the maximum pixel capacity, a higher image sensor gain or ISO setting will not decrease the SNR. Only less light does that.

Conclusion 5: Taking images with any camera allows to determine which image sensor gain value reaches unity gain and by knowing that, the full well capacity in electrons, even if it is not published by the manufacturer.

Determining readout noise and conversion gain

Besides shot noise, there is fixed noise per image taken (read noise, sometimes called temporal dark noise or just temporal noise). In theory, all you need is to take a single shot at no exposure time without any light and measure the noise. But how can you be certain the black level is set correctly to show you all the noise and that the image sensor behaves perfectly linear? Real cameras do not allow zero exposure time. There is a better way.

Gaussian noise sources are added this way:

$${\text"noise"_\text"total"} ^2 = { \text"noise"_\text"readout" } ^2 + { \text"noise"_\text"brightness" } ^2$$

To measure the readout noise showing in images, the brightness noise needs to be converted:

$$\table , { \text"noise"_\text"brightness" } ^2; =, ( { √{\text"electrons"} } / \text"gain"_\text"conversion" ) ^2; =, \text"electrons" / { \text"gain"_\text"conversion" · \text"gain"_\text"conversion" } ; =, { 1 / \text"gain"_\text"conversion" } · \text"brightness";$$

Now the total noise reads:

$${\text"noise"_\text"total"} ^2 = { \text"noise"_\text"readout" } ^2 + { 1 / \text"gain"_\text"conversion" } \text"brightness" $$

That means ${\text"noise"_\text"total"} ^2$ depends on the image brightness in a linear way with ${ \text"noise"_\text"readout" } ^2$ being the offset and $1 / \text"gain"_\text"conversion"$ the slope.

By measuring the total noise of two images of different brightness, both $\text"gain"_\text"conversion"$ and $\text"noise"_\text"readout"$ can be determined. It is better to take an exposure series, though, to be certain that the noise does not differ much between different images.

One question remains: How can the noise of one image be measured? The difference of two images contains the noise of two images, not of one image:

$${ \text"noise"_\text"1" } ^2 + { \text"noise"_\text"2" } ^2 = {\text"noise"_\text"sum"} ^2$$ $$2 · { \text"noise"_\text"1" } ^2 = {\text"noise"_\text"sum"} ^2$$ $$√{2} · \text"noise"_\text"1" = {\text"noise"_\text"sum"}$$ $$\text"noise"_\text"1" = {\text"noise"_\text"sum"} / √{2}$$

Divide the standard deviation of the image difference by the square root of 2 and you have the noise of one image.

There is one caveat, though: The measured noise also contains quantization noise. It is typically assumed to be uncorrelated to the quantized signal and added as square root of squared sums, but if the quantized signal has a standard deviation below 1/2 LSB this assumption no longer holds and using the quantization standard deviation of $\text"LSB" / √{12}$ fails badly. Should the read noise be below 1/2 DN, converting it to electrons will give wrong results. See the excellent article Sub LSB Quantization for details.

The above does not take $\text"gain"_\text"conversion"$ into account. It depends on the sensor and how its gain operates how it influences the read noise. There is no other way than repeating the above for different gain values to find out. Although it may appear logical that signal amplification leads to noise amplification, the opposite is true for many cameras, whereas others may show read noise to be mostly independent of gain.

Determining the full well capacity

Now that we know the $\text"gain"_\text"conversion"$, the full well capacity can be determined:

$$\text"gain"_\text"conversion" = \text"full well capacity" / { \text"gain"_\text"sensor" · \text"DN"_{\text"max"} }$$ $$\text"gain"_\text"conversion" · { \text"gain"_\text"sensor" · \text"DN"_{\text"max"} } = \text"full well capacity"$$

Note: The $\text"gain"_\text"sensor"$ is a divider of $\text"gain"_\text"conversion"$, so different image sensor gain settings do not change the product, although it may look different at first glance.

At unity gain, the (usable) full well capacity and $\text"DN"_\text"max"$ are equal, so the above formula can be simplified as:

$$\text"gain"_\text"conversion" = 1 / \text"gain"_\text"sensor"$$

To reach unity gain, divide the used $\text"gain"_\text"sensor"$ setting by $\text"gain"_\text"conversion"$.

In case of bright images, the readout noise can be mostly neglected and there is a quick way to estimate the $\text"gain"_\text"conversion"$ and at which $\text"gain"_\text"sensor"$/ISO value you reach unity gain:

$$\text"noise"_\text"brightness" = √{\text"electrons"} / \text"gain"_\text"conversion"$$ $$\table {\text"noise"_\text"brightness"}^2, =, ( √{\text"electrons"} / \text"gain"_\text"conversion" ) ^2; , =, \text"electrons" / { \text"gain"_\text"conversion" }^2$$ $${\text"noise"_\text"brightness"}^2 · \text"gain"_\text"conversion" / \text"electrons" = 1 / \text"gain"_\text"conversion"$$ $${ 1 / {\text"noise"_\text"brightness"}^2 } · \text"electrons" / \text"gain"_\text"conversion" = \text"gain"_\text"conversion"$$ $${ \text"brightness" } / {\text"noise"_\text"brightness"}^2 = \text"gain"_\text"conversion"$$

Readout noise and dynamic range

The dynamic range is the ratio between the smallest and the largest detectable brightness. Although it sounds like this is identical to how many different brightness values can be detected, it's not. It is assumed that the smallest detectable brightness is identical to the readout noise, although the readout noise is not a fixed interval, but the standard deviation of a gaussian distribution.

The definite limit is the full well capacity, but until sensors can count individual electrons, the sensor noise or the pixel data format is the limit.

$$\table \text"dynamic range", =, { \text"full well capacity" \:\:[\e^{-}] } / {\text"noise"_\text"readout" \:\:[\e^{-}]}; , =, { \text"DN"_{\text"max"} \: [\text"DN"]} / {\text"noise"_\text"readout" \: [\text"DN"]}$$

The dynamic range is either given as a ratio with the normalized read noise $1$, e.g. $400 : 1$ (normalized from $256 : 0.64$ or as decibel:

$$\text"dynamic range" \:\:[\text"dB"] = 20 · log_10(\text"dynamic range" \:\:[\text"ratio"])$$

Note: If you have a readout noise less than $1.0 \:[\text"DN"]$, then it is lower than the digital value can represent. This happens easily when using an image format with 8 bit pixel depth. In that case, some image information is lost and the true dynamic range is $256 : 1 = 48.2 \:\:[\text"dB"]$.

Depending on what is measured, the dynamic range may differ: Image sensor pixel, A/D resolution, both combined, etc.

Besides ratio and decibel, sometimes the dynamic range is given as f-stops: The aperture is the opening of an optical system, either given as diameter or as fraction of the focal length. Due to the area of a circle being proportional to $\text"diameter"^2$, the area and hence the amount of light doubles if the diameter is increased by $√{2}$.

That's the reason for the seemingly odd f-stops f/1.4, f/2, f/2.8, f/4 and so on: All growing by the factor $√{2}$, all halving the light compared to the previous f-stop.

Since f-stops are a relative measure of light, the dynamic range can be specified in f-stops: $8$ f-stops are $256 : 1$. An imager having that dynamic range that shows a white image will show a black image at $7$ f-stops more, because then the image reaches the level of the read noise.

Finally, a word of caution on using a single value: The dynamic range does not say anything about the dominant noise source. Column or row noise looks very different to gaussian noise. The dynamic range ignores photon noise. The actual reachable SNR for bright images will be dominated by photon noise, not readout noise.

Gain/ISO and dynamic range

The image sensor gain, or ISO on a digital camera, amplifies the pixel value. Since the maximum pixel value stays the same, it reduces the usable full well capacity. In case gain does amplify noise as well, introducing image sensor gain $2.0$ into the above example yields:

$$256 / { 2.0 · 0.64 } = 200 : 1$$

The histogram will still use all $256$ values, but noise is doubled. Since the 8 bit dynamic range was $256 : 1$, you don't lose as much as expected. The step from gain $2.0$ to gain $4.0$ will be worse.

In case gain does not amplify noise as well, you may end up with the same dynamic range. It may even be true that gain reduces the readout noise:

$$256 / { 0.5 } = 512 : 1$$

Conclusion: The best SNR for any imager gain is reached by using enough exposure to go close to the maximum pixel capacity (exposing to the right, ETTR). Otherwise dynamic range is wasted. If the exposure is limited, increasing gain to use the remaining pixel range has no negative effect and may even be beneficial.

It is often said that a high sensor gain value causes a noisy image. In fact, it is lack of photons causing that. There is no substitute for enough photons.

Photon noise and gamma encoding

The absolute noise grows with increased brightness, which means the difference of one DN means less for more bright pixels. That reduces the number of detectable brightness levels for photon based images.

If photon noise is not interesting, and it isn't to most people, it saves storage to encode the brightness using a variable brightness resolution that emphasizes dark values at the cost of high values and shows this characteristic:

$$\text"SNR"(\text"brightness") = √{\text"brightness"} = \text"brightness"^{1/2}$$

That characteristic function is the gamma encoding function with gamma value $2$: Encoding uses $1/\text"value"$, decoding uses the value itself.

The actual SNR is further determined by a linear function that depends both on the sensor read noise, which only shows much in the dark area, and its quantum efficiency. This leads to the SNR function starting rather linear for darkness and converting to square root after leaving the range dominated by read noise.

Commonly gamma functions use values between 1.8 to 2.4 and more sophisticated functions contain the linear part (BT.709 and sRGB). These functions are modelled to approximate human vision better, which obviously uses a space efficient brightness encoding. So does TV, because wireless broadcast has a limited bandwidth. A linear image always looks too dark and appears to have too strong contrast to humans. The characteristic property of increasing low brightness values and decreasing high brightness values is why changing the gamma value is a popular way to change the image contrast.

Since the gamma function offers a noise oriented compression, almost all image formats do not store linear data, but gamma encoded data. That is very useful e.g. for comparing contrast ratios, because otherwise the photon noise would be in the way and the comparison had to incorporate the brightness. On the other hand, some image processing algorithms depend on one or the other. It is common that the required representation is not specified for algorithms. The FFT makes more sense on linear data, because the square root of a frequency is not a single frequency any more, but contains additional higher order frequencies.

What happens if algorithms run on the wrong brightness specification? Since the square root is not extreme compared to linearity, the errors from running algorithms on the wrong representation are not obvious at first glance. Together with the often missing specification, this is a common source of mistakes.