Spectrometer wavelength calibration

There are many DIY spectrometers, but after construction, they all hit the same issue: Wavelength calibration. You need a reference source and software to establish the mapping of pixel positions to wavelengths, suitable file formats to store it all and a workflow flexible enough to deal with the wide variety of equipment and data.

This is a walkthrough, like a cheat sheet. I will show how to calibrate the spectrum of a gas discharge lamp using current free software.

Introduction

The hardware part is actually easy: The most reliable, precise and well available reference source are DIY gas discharge lamps. Forget LEDs, flourescent lamps, or some laser pointers. Gas discharge lamps deliver many well defined spectral lines that are extremely narrow and have a great SNR.

The issue is software: Spectrometry is one of those things that looks real easy to begin with and gets more complicated as you learn. It makes sense to start with what the pros use from day 1. There are various mature packages with their maintenance end already in sight. It looks like the community favorite is AstroPy and things based on it, in particular specutils and specreduce.

It’s Python, so expect APIs to change under your feet and examples to lag behind. Very terse documentation won’t bother you with many of them, though. The reward is software for common best practice workflows that keeps working until you update anything.

Set up the environment

Create a project directory, change inside and set up a virtual Python environment. The examples assume bash:

$ python -m venv env
$ source env/bin/activate

Create requirements_unversioned.txt:

astropy
matplotlib
specreduce

And populate the virtual environment with those packages:

(env) $ pip install -r requirements_unversioned.txt

At the time of writing this, I used astropy 7.0.1, matplotlib 3.10.1 and specreduce 1.5.1.

Reducing the gas discharge lamp 2D spectrum

No matter how the image containing the spectrum was obtained, it has to be converted to FITS format, because that is the standard format for many things in astronomy, and any spectrometer should deliver data that way.

It is popular to acquire images with the spectrum spreading horizontally. That is called the dispersion axis. The vertical direction is the cross dispersion axis. Some functions have optional parameters to change them. Further, it is popular to have lower wavelengths to the left and higher wavelengths to the right. The spectrum of a Neon/Xenon lamp might look like this, if you adjust brightness and contrast to see the weak lines:

The shape and background may be very different, though, depending on the spectrometer. That is called a 2D spectrum. Its flux is given in DN, the background is not subtracted and the wavelength is in pixel positions with no calibration information. The process of converting it to a 1D spectrum is called reduction, because that’s what it is.

The 2D spectrum is best read not using astropy.io.fits, because that is kind of the low level FITS access, but using astropy.nddata.CCDData, which provides those units, and it is good practice to use units everywhere.

from astropy.nddata import CCDData

image_data = CCDData.read("nexe.fits")

Now the trace is determined, that is a line through the center of the spectrum. In this case, it is a straight line, thus called a flat trace, but specreduce contains other tracers as well. I wrote a small function that determines its position pos and some other values automatically:

import specreduce.tracing

pos, ap, bg_separation, bg_width = dispersion_range(image_data.data)
trace = specreduce.tracing.FlatTrace(image_data, pos)

You may notice that the quantities of image_data are stripped off using the property .data. As it turns out, polymorphism has its limits and np.mean cannot deal with units. Expect that to occur every now and then.

Now the background needs to be determined. Here it is just a simple region above and below the spectrum, called a two sided background, but it may be quite different, so again there are other background extractors, too.

import specreduce.background

bg = specreduce.background.Background.two_sided(image_data, trace,
  separation=bg_separation, width=bg_width)

sub_image_data = image_data - bg

That looks pretty, but is actually pretty misleading. It looks like the background image is being subtracted from the main image, which makes you wonder what the semantics may be.

Actually the background is an array of the column averages of the background regions. The Background class overloads the minus operator from the right to define what the subtraction of the background from an image does, just for being able to write one single pretty misleading line of code.

Now that the background was subtracted, the 1D spectrum can be extracted along the trace. Here a fixed width of equal weighted pixels is used, called a boxcar. The spectrum is the sum of all pixels in a column. Again, there are other extractors for other spectrum types.

extract = specreduce.extract.BoxcarExtract(sub_image_data, trace, width=ap)
spectrum = extract.spectrum

Why does that work without importing specreduce.extract? Well, it does for me.

And there you have it, a 1D spectrum. In order to know later how it was obtained, a FITS header with the history can be added to the meta data:

import astropy.io.fits

spectrum.meta['header'] = astropy.io.fits.Header()
spectrum.meta['header']['HISTORY']
  = "specreduce.tracing.FlatTrace pos={:d}".format(pos)
spectrum.meta['header']['HISTORY']
  = "specreduce.background.Background.two_sided "
    "separation={:.1f}, width={:.1f}".format(bg_separation, bg_width)
spectrum.meta['header']['HISTORY']
  = "specreduce.extract.BoxcarExtract width={:d}".format(ap)

Yes, that looks wrong. The FITS header class looks like a dictionary, but it isn’t. It knows HISTORY is a keyword that can occur multiple times, so it appends it each time you set it.

Viewing it is fairly easy:

import matplotlib.pyplot as plt

fig, ax = plt.subplots()
ax.set_title('Neon/Xenon spectrum')
ax.set_xlabel('Wavelength [pix]')
ax.set_ylabel('Flux [adu]') 
ax.plot(spectrum.spectral_axis, spectrum.flux)
plt.show(block=True)

Finding the discharge lines

A simple maximum finder would be easy, but also suspectible to noise, in case the spectrum is super-sampled. In case the peaks are symmetric, and that depends on the spectrometer, a gauss fit is a popular method that delivers the peaks with subpixel resolution.

These peaks look symmetric. Next we need a noise floor estimation for thresholding. The start of the spectrum is dark. Instead of populating a spectrum property, the function returns a new spectrum, which has that property added. The line finder prints a warning and I do not know why, but it is no continuous spectrum for sure and I disable the warning. Finally the line finder needs to know the FWHM of the spectrometer. It should agree with the FWHM seen in the spectrum. Here I ask for peaks above 20 standard deviations, which is very safe, but I have enough signal for that. The centroids of the fits are the actual peaks.

from specreduce.line_matching import find_arc_lines
from specutils.manipulation import noise_region_uncertainty
from specutils.spectra import SpectralRegion
from specutils import conf

noise_region = SpectralRegion(0*u.pix, 500*u.pix)
spectrum2 = noise_region_uncertainty(spectrum, noise_region)

conf.do_continuum_function_check = False
lines = find_arc_lines(spectrum2, fwhm=7.8*u.pix, noise_factor=20)

peaks_pix = lines['centroid']

Matching discharge lines to reference values

Neither specreduce nor specutils offer a function for that and it is not a trivial problem to automate. There is RASCAL, but after trying for a day, I just gave up to get it working. Dynamic time warping sounds promising, but there is some non-linearity that cannot be neglected and given a missing intensity calibration, only peaks can be matched. The problem is not as easy as it may look at first.

I match peaks manually using the event picker in matplotlib. The Neon/Xenon spectrum is pretty easy to recognize once you look at it for a while and the process goes quite fast. That may be different for other calibration sources.

Calibrating a 1D spectrum

Now we have all we need to calibrate the spectrum using a polynomial fit of the positions and their wavelengths.

The documentation left me in the dark why things are done that way, but this works for me. First a calibration object is created. That does not yet create the fit! The second line does that and it is essential not to add parens as for a function, because this is a @cached_property. The call apply_to_spectrum is a misnomer, because it does not apply the calibration to the spectrum! Instead it creates a new calibrated spectrum from the old one, but it does not copy the header, which loses e.g. all HISTORY information.

calibration.residuals contains the residuals, which are expected to look pretty random with a maximum given by the position quantization noise.

It makes sense to record the used calibration values and the model as HISTORY keywords. Since they are really quantities, this saves the unit as well. Finally, the result is saved. It is common convention to prevent people from overwriting FITS files. The cfitsio library uses an exclamation mark as first filename character to signal that you do want to possibly overwrite a file. In specutils an optional variable is needed to allow overwriting a FITS file.

from astropy.modeling.polynomial import Polynomial1D
from specreduce import WavelengthCalibration1D

calibration = WavelengthCalibration1D(spec, line_pixels=line_pixels,
                                      line_wavelengths=line_wavelengths,
                                      input_model=Polynomial1D(degree=2))

fit = calibration.fitted_model

cal_spec = calibration.apply_to_spectrum(spec) 
cal_spec.meta['header'] = spec.meta['header']

for peak in range(len(line_pixels)):
    cal_spec.meta['header']['HISTORY'] = \
      "identify {:.2f} = {:.2f}".format(line_pixels[peak],line_wavelengths[peak])
 
cal_spec.meta['header']['HISTORY'] = \
  "specreduce.WavelengthCalibration1D(input_model=Polynomial1D(degree=2))"

cal_spec.write('nexe-1d-cal.fits', overwrite=True)

Retrieving the calibration information

The naive expectation is that the calibration information is saved as WCS model and the data stays as it is when using cal_spec.write. You could just copy the WCS model to another spectrum to calibrate it.

I don’t know why, but that is not the case. Instead the primary HDU is empty and a secondary HDR using tabular data consisting of wavelength and flux is saved, which loses the polynome. My solution is to retrieve the identified lines and then proceed with the calibration. Although the unit is saved for single elements, the used data structure is an array of values with a single quantity, not an array of quantities, and the unit must be added to the array.

from specutils import Spectrum
import astropy.units as u

spec = Spectrum.read(spec2d_file)
        
history = spec.meta['header']['HISTORY']
 
matched_position = []
matched_wavelength = []
        
for line in history:
    if line.startswith('identify'):
        items = line.split()
        if matched_position == []:
            position_u = u.Unit(items[2])
            wavelength_u = u.Unit(items[5])
        matched_position.append(float(items[1]))
        matched_wavelength.append(float(items[4]))
 
matched_position = matched_position * position_u
matched_wavelength = matched_wavelength * wavelength_u

Summary

So that’s how to perform wavelength calibration with current free software. I am not pleased how much time it took to solve all the small issues that prevented my code from working.