Redshift Estimation for BOSS quasar spectra
This code repository contains code to completely reproduce the quasar redshift estimates reported in
Leah Fauber, Ming-Feng Ho, Simeon Bird, Christian R. Shelton, Roman Garnett, Ishita Korde Automated Measurement of Quasar Redshift with a Gaussian Process. arXiv:2006.07343 [astro-ph.GA],
including all intermediate data products including the Gaussian process null model described therein. The provided parameters should exactly reproduce the catalog in that work; however, you may feel free to modify these choices as you see fit.
The pipeline has multiple stages, outlined and documented below.
Loading catalogs and downloading spectra
The first step of the process is to load the DR12Q quasar catalog and available DLA catalogs, extract some basic data such as redshift, coordinates, etc., and apply some basic filtering to the spectra:
- spectra with z < 2.15 are filtered
- spectra identified in a visual survey to have broad absorption line (BAL) features are filtered
Relevant parameters in set_parameters
that can be tweaked if desired:
% preprocessing parameters
z_qso_cut = 2.15; % filter out QSOs with z less than this threshold
This process proceeds in three steps, alternating between the shell and MATLAB.
First we download the raw catalog data:
# in shell
cd data/scripts
./download_catalogs.sh
Then we load these catalogs into MATLAB:
% in MATLAB
set_parameters;
build_catalogs;
The build_catalogs
script will produce a file called file_list
in
the data/dr12q/spectra
directory containing relative paths to
yet-unfiltered SDSS spectra for download. The file_list
output is
available in this repository in the same location. The next step is to
download the coadded "speclite" SDSS spectra for these observations
(warning: total download is 35 GB). The download_spectra.sh
script
requires wget
to be available. On OS X systems, this may be
installed easily with Homebrew.
# in shell
cd data/scripts
./download_spectra.sh
download_spectra.sh
will download the observational data for the yet
unfiltered lines of sight to the data/dr12q/spectra
directory.
Loading and preprocessing spectra
Now we load these data. Spectra with fewer than 400 nonmasked pixels are filtered.
Relevant parameters in set_parameters
that can be tweaked if
desired:
% preprocessing parameters
min_num_pixels = 400; % minimum number of non-masked pixels
When ready, the MATLAB code to preload the spectra is:
set_parameters;
release = 'dr12q';
file_loader = @(plate, mjd, fiber_id) ...
(read_spec(sprintf('%s/%i/spec-%i-%i-%04i.fits', ...
spectra_directory(release), ...
plate, ...
plate, ...
mjd, ...
fiber_id)));
preload_qsos;
The result will be a completed catalog data file,
data/[release]/processed/zqso_only_catalog.mat
, with complete filtering
information and a file containing preloaded and preprocessed data for
the 158821 nonfiltered spectra,
data/[release]/processed/preloaded_zqso_only_qsos.mat
.
Building GP models for quasar spectra
Now we build our models, including our Gaussian process null model for quasar emission spectra.
To build the null model for quasar emission spectra, we need to indicate a set of spectra to use for training. Here we select all spectra in DR9 and not removed by our filtering steps.
These particular choices may be accomplished with:
training_release = 'dr12q';
train_ind = ...
[' catalog.in_dr9 & ' ...
'(catalog.filter_flags == 0) ' ];
After specifying the spectra to use in training_release
and
train_ind
, we call learn_qso_model
to learn the model.
To learn the model, you will need the MATLAB toolbox
minFunc
available from Mark Schmidt.
You should cd to the directory where you installed minFunc to and run:
addpath(genpath(pwd));
mexAll;
to initialize minFunc before the first time you use it.
Relevant parameters in set_parameters
that can be tweaked if
desired:
% null model parameters
min_lambda = 910; % range of rest wavelengths to Å
max_lambda = 3000; % model
dlambda = 0.25; % separation of wavelength grid Å
k = 20; % rank of non-diagonal contribution
max_noise_variance = 4^2; % maximum pixel noise allowed during model training
% optimization parameters
minFunc_options = ... % optimization options for model fitting
struct('MaxIter', 4000, ...
'MaxFunEvals', 8000);
When ready, the MATLAB code to learn the null quasar emission model is:
training_set_name = 'dr9q_minus_concordance';
learn_qso_model;
The learned qso model is stored in
data/[training_release]/processed/learned_zqso_only_model_outdata_normout_[training_set_name]_norm_[normalization_min_lambda]-[normalization_max_lambda].mat
.
Processing spectra for Redshift Estimation
Finally, we may use our built model to compute the posterior probability of the quasar redshift as described in the paper above.
We must specify a few things first. First, we must specify which quasar emission model to use; to select the one learned above, we may use
% specify the learned quasar model to use
training_release = 'dr12q';
training_set_name = 'dr9q_minus_concordance';
(the code will attempt to load the model from a file called
data/[training_release]/processed/learned_zqso_only_model_outdata_normout_[training_set_name]_norm_[normalization_min_lambda]-[normalization_max_lambda].mat
.)
Next, we must specify which spectra to process. Here we use all DR12Q spectra that were not filtered:
% specify the spectra to process
release = 'dr12q';
test_set_name = 'dr12q';
test_ind = '(catalog.filter_flags == 0)';
When ready, the selected spectra can be processed with process_qsos
.
Relevant parameters in set_parameters
that can be tweaked if
desired:
num_zqso_samples = 10000; % number of parameter samples
This script will write the results in
data/[release]/processed_qsos_[test_set_name].mat
.
The complete code for processing the spectra in MATLAB is:
% produce catalog
set_parameters;
% specify the learned quasar model to use
training_release = 'dr12q';
training_set_name = 'dr9q_minus_concordance';
% specify the spectra to process
release = 'dr12q';
test_set_name = 'dr12q';
test_ind = '(catalog.filter_flags == 0)';
% process the spectra
process_qsos;
Finally, we may create an ASCII catalog of the results if desired with
generate_ascii_catalog
, e.g.:
set_parameters;
training_release = 'dr12q';
release = 'dr12q';
test_set_name = 'dr12q';
generate_ascii_catalog;