A Python 3 script for baseline correction, smoothing, processing and plotting of Raman spectra. Data must be in the format wavenumber [space] intensity
. The baseline correction uses the asymmetrically reweighted penalized least squares smoothing algorithm (arPLS). The Whittaker filter is (by default) applied for smoothing. Optionally, the Savitzky–Golay filter can be used. Data of the processed spectra can be saved as "csv"-like data files in the format wavenumber [delimiter] intensity
. An overlay spectrum (normalized and not normalized) and a normalized stacked spectrum of all processed spectra can be plotted. Plots can be saved as PNG bitmap files and as PDF.
If you use the arPLS algorithm to process your spectra, please cite:
"Baseline correction using asymmetrically reweighted penalized least squares smoothing"
Sung-June Baek, Aaron Park, Young-Jin Ahna, Jaebum Choo, Analyst 2015, 140, 250-257
The Whittaker algorithm (sometimes also referred to as Whittaker-Eilers smoother) is adapted from:
"A perfect smoother"
Paul H. C. Eilers, Anal. Chem. 2003, 75, 3631-3636
which is based on:
"On a new method of gradutation"
E. T. Whittaker, Proceedings of the Edinburgh Mathematical Society 1922, 41, 63-75
numpy
, scipy
, matplotlib
Start the script with:
python3 raman-tl.py filename
to open a single file.
Start the script with:
python3 raman-tl.py *.txt
to process all files with the extension .txt
in the folder.
Under Windows, you have to open PowerShell
first and start the script with:
python raman-tl.py (Get-ChildItem *.txt -Name)
to process all files with the extension .txt
in the folder.
In all cases a file summary.pdf
will be created which contains the following plots:
On the first page (from top to bottom):
- raw spectrum with baseline plot (red)
- baseline corrected spectrum
- smoothed / filtered spectrum with peak annotation
On the following page(s):
- smoothed / filtered spectrum with peak annotation
- not normalized and normalized overlay spectra and normalized stacked spectra if the
-o
option was invoked
filename
, required: filename(s), input file(s) in the formatwavenumber [space] intensity
-l
N
, optional: the lambda parameter for the arPLS algorithm (default isN = 1000
)-p
N:M
, optional: invokes the Savitzky–Golay filter,N:M
are the window length and polynomial order of the Savitzky–Golay filter-w
N
, optional: the lambda parameter for the Whittaker filter (default isN = 1
)-xmin
N
, optional: start spectra atN
wave numbers-xmax
N
, optional: end spectra atN
wave numbers-t
N
, optional: threshold for peak detection, withN
being the intensity (default is 5% from the maximum intensity)-m
N
, optional: multiply intensities withN
(default isN = 1
)-a
N
, optional: add or subtractN
to / from wave numbers (default isN = 0
)-i
N
, optional: add or subtractN
to / from intensities (default isN = 0
)-o
, optional: show the normalized and not normalized overlay spectrum and the normalized stacked spectrum-n
, optional: do not savesummary.pdf
-s
p,d
, optional: save P(NG) and / or D(ATA) files. The filenames arefilename.png
and / orfilename-mod.dat
for the single spectra. Data files are in the formatwavenumber [delimiter] intensity
. The delimiter can be set in the script. The default delimiter is [space].Summary.png
,overlay.png
,overlay-normalized.png
,stack-normalized.png
bitmaps will be saved as well, overlay and stacked spectra only if the-o
option has been invoked.
- The save values for the arPLS parameter
lambda
start from 1000. Smaller values will give sharper peaks, but broader peaks become part of the baseline. Check the red baseline curve in the summary page. - There is no way to turn off smoothing directly, but with two Savitzky-Golay parameters close together, e.g.
-p3:2
or a Whittaker parameter-w0.01
filtering is ineffective. - The window length for the Savitzky–Golay filter must be an odd number and the window length must be greater than the polynomial order.
- Polynomial based filters, such as the Savitzky–Golay filter, sometimes tend to overshoot in negative regions, especially with sharp signals in the Raman spectrum. Reduce the filtering (see above) is one way to solve this problem.
xmin
and orxmax
values outside the experimental wave number range will result in errors or strange outputs.-a
changes the range forxmin
andxmax
-i
and-m
change the range for-t
- The
.dat
file contains the data of the processed spectrum in the given range as it is shown in the plot for the single spectrum. - The
-o
option invokes the overlay plots (normalized and not normalized) and the normalized stacked plot of all processed spectra. Normalized means, that the intensities are divided by the maximium intensity in the given intensity range. The maximum intensity becomes unity. The peak detection threshold for the normalized spectrum is 0.05 (can be changed in the script:normalized_height
). - The delimiter in the
.dat
file can be changed in the script:dat_delimiter = " "
ordat_delimiter = " ; "
for example. - The files
summary.pdf
,summary.png
,overlay.png
,overlay-normalized.png
,stack-normalized.png
will be overwritten every time the script is started (with respective options) in the same directory. Single spectra with the same filenames will be overwritten as well. Rename them if you want to keep them.
- Some of the peaks that are close together are not annotated. To change this, one can reduce the
peak_distance
in the script, which is by defaultpeak_distance = 8
. - Peak annotations can be overprinted by other peak annotations in the overlay spectrum. There is no workaround for this. If annotations are in the same position, one can uncomment the instruction under
#no dupes
in the script, then only one annotation is displayed. - The legend obscures part of the spectrum. If this is a problem, one can change the position of the legend in the script or prevent the legend from being printed at the spectrum (try to change
head_space_y_o_s
in the script for the overlay and stacked spectra).
Remember, under Windows you have to open PowerShell
first and start the script with:
python raman-tl.py (Get-ChildItem *.txt -Name)
to open more than one file at once.
python3 raman-tl.py s*.txt
Process all files starting with s
and the extension .txt
.
Summary:
Single spectra:
python3 raman-tl.py sample-A.txt -xmin 600 -xmax 800 -spd
Process spectrum sample-A.txt
in the range from xmin = 600
to xmax = 800
cm-1 and save the PNG and DATA files (-spd
).
Summary:
Single spectrum:
python3 raman-tl.py sample-A.txt -l10000 -p7:4 -xmin 600 -xmax 800 -t50 -spd
Process spectrum sample-A.txt
with lambda = 10000
(baseline parameter), window length = 7
and polynomial order = 4
(smoothing parameters) in the range from xmin = 600
to xmax = 800
cm-1, annotate peaks with intensities equal or greater than t = 50
and save the PNG and DATA files (-spd
).
Summary:
Single spectrum:
python3 raman-tl.py sample-A.txt sample-B.txt -o -xmin 200 -xmax 1100 -sp
Process spectra sample-A.txt
and sample-B.txt
in the range from xmin = 200
to xmax = 1100
cm-1, plot the overlay and stacked spectra (-o
) and save the PNG files (-sp
).
Overlay spectrum (not normalized):
Overlay spectrum (normalized):
Stacked spectrum (normalized):