Copyright 2022 Matthew Digman

Produced under NASA LISA Preparatory Science Grant 80NSSC19K0320

==========License=================
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
GNU General Public License for more details.

You should have received a copy of the GNU General Public License
along with with program; see the file COPYING. If not, write to the
Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston,
MA  02111-1307  USA

=============Overview=============
This module implements the fast forward and inverse WDM wavelet transforms in python from both the time and frequency domains. 
The frequency domain transforms are inherently much faster and more accurate. 
The wavelet domain->frequency domain and frequency domain->wavelet domain transforms
are nearly exact numerical inverses of each other with parameters similar to the test parameters used on gaussian random noise. 

==========Data Structure==========
Currently, we assume both Nt and Nf are even. If either is not even, the results may not be accurate, though they could be extended to general shapes in the future.
Because the highest and lowest frequency bins each need only half the coefficients as the other bins, we can pack all the coefficients into 
the bottom frequency pixel, and retain an Nt by Nf matrix, alternating the lowest and highest frequencies at even and odd time pixels. This packing is why we require the number of time pixels to be even.


===========Running================
We provide both command line and python interfaces to the code. The command line interfaces read and save results from .dat files.
The 6 python interfaces to the code can be found in wavelet_transforms.py
and are named: inverse_wavelet_freq,inverse_wavelet_freq_time,inverse_wavelet_time,transform_wavelet_time,transform_wavelet_freq,transform_wavelet_freq_time
The interfaces are invoked as follows:

Forward Transforms:

frequency domain->wavelet domain via frequency domain transform:
Python:
signal_wavelet_out = transform_wavelet_freq(signal_freq_in,Nf,Nt,dt)
Command line:
python forward_wavelet_freq_harness.py filename_freq_in filename_wavelet_out dt Nt Nf

time domain->wavelet domain via time domain transform:
Python:
signal_wavelet_out = transform_wavelet_time(signal_time_in,Nf,Nt,dt,mult=mult)
Command line:
python forward_wavelet_time_harness.py filename_time_in filename_out dt Nt Nf mult

time domain->wavelet domain via fft and frequency domain transform
Python:
signal_wavelet_out = transform_wavelet_freq_time(signal_time_in,Nf,Nt,dt)
Command line:
python forward_wavelet_freq_time_harness.py filename_time_in filename_wavelet_out dt Nt Nf

Inverse Transforms:

wavelet domain->frequency domain via frequency domain transform:
Python:
signal_freq_out = inverse_wavelet_freq(signal_wavelet_in,Nf,Nt,dt)
Command line:
python inverse_wavelet_freq_harness.py filename_wavelet_in filename_freq_out dt

wavelet domain->time domain via time domain transform:
Python:
signal_time_out = inverse_wavelet_time(signal_wavelet_in,Nf,Nt,dt,mult=mult)
Command line:
python inverse_wavelet_time_harness.py filename_wavelet_in filename_time_out dt mult

wavelet domain->time domain via frequency domain transform and fft:
Python:
signal_time_out = inverse_wavelet_freq_time(signal_wavelet_in,Nf,Nt,dt)
Command line:
python inverse_wavelet_freq_time_harness.py filename_wavelet_in filename_time_out dt


===============Timing===============

The fast transforms are quite efficient. On a single core of an AMD Ryzen Threadripper 3970X
with the MKL FFT libraries installed, a version of time_tests.py with approximately 8 years 
of gaussian random noise data sampled at 5s intervals gives the following timings:

dt = 5.0 Nt = 12288 Nf = 4096 mult = 16
begin loading data files
generated data in              4.2397728s
got time->freq in              0.2338865s    1.0000000 X fft time
got freq->time in              0.2127792s    0.9097540 X fft time
got wavelet->freq in           3.2477098s   13.8858352 X fft time
got wavelet->time in           5.5590221s   23.7680302 X fft time
got wavelet->freq->time in     3.6975480s   15.8091531 X fft time
got freq->wavelet in           1.6941847s    7.2436183 X fft time
got time->wavelet in          15.2326701s   65.1284624 X fft time
got time->freq->wavelet in     2.1416471s    9.1567782 X fft time

The timing results will depend heavily on the fft library used. 
Currently, fft_funcs.py supports either numpy or mkl-fft fft libraries. In our testing, mkl-fft is consistently faster.
For example, here is a comparison of a run on a single core of the same machine with 6 months of data at 30s intervals with numpy fft:

mkl fft not available trying numpy
dt = 30.0 Nt = 512 Nf = 1024 mult = 16
begin loading data files
generated data in              0.6920439s
got time->freq in              0.0095489s    1.0000000 X fft time
got freq->time in              0.0112231s    1.1753315 X fft time
got wavelet->freq in           0.0182579s    1.9120501 X fft time
got wavelet->time in           0.0586002s    6.1368782 X fft time
got wavelet->freq->time in     0.0329091s    3.4463912 X fft time
got freq->wavelet in           0.0187142s    1.9598336 X fft time
got time->wavelet in           0.2210319s   23.1474656 X fft time
got time->freq->wavelet in     0.0257588s    2.6975836 X fft time

vs with mkl fft:

dt = 30.0 Nt = 512 Nf = 1024 mult = 16
begin loading data files
generated data in              0.5529449s
got time->freq in              0.0011396s    1.0000000 X fft time
got freq->time in              0.0013177s    1.1563102 X fft time
got wavelet->freq in           0.0086046s    7.5505823 X fft time
got wavelet->time in           0.0286762s   25.1634493 X fft time
got wavelet->freq->time in     0.0191764s   16.8273333 X fft time
got freq->wavelet in           0.0093265s    8.1840517 X fft time
got time->wavelet in           0.1526154s  133.9205840 X fft time
got time->freq->wavelet in     0.0153940s   13.5082907 X fft time

all of the transforms are faster with mkl fft, though the improvement does not scale linearly with the time a single fft of the full dataset takes.
The different timings will be highly architecture dependent. 
Note that with mult=16, the wavelet->time and time->wavelet transform are approximate, 
yet the time->freq->wavelet transform is approximately an order of magnitude faster than the time->wavelet transform, with a similar but smaller
improvement in the wavelet->freq->time vs wavelet->time transform. 
To achieve the same accuracy as the wavelet->freq and freq->wavelet transforms, we would need to set mult=Nt/2 (256 in this case),
in which case we would get timings:

dt = 30.0 Nt = 512 Nf = 1024 mult = 256
begin loading data files
generated data in              0.6983505s
got time->freq in              0.0011767s    1.0000000 X fft time
got freq->time in              0.0015124s    1.2853283 X fft time
got wavelet->freq in           0.0114840s    9.7597607 X fft time
got wavelet->time in           0.5447834s  462.9863181 X fft time
got wavelet->freq->time in     0.0165543s   14.0687378 X fft time
got freq->wavelet in           0.0127060s   10.7982746 X fft time
got time->wavelet in           2.4145518s 2052.0162917 X fft time
got time->freq->wavelet in     0.0167831s   14.2631639 X fft time

which is approximately two orders of magnitude  slower for the wavelet->time and time->wavelet transforms compared to using an fft.
The remaing transform timings differ only due to random timing variation.
Therefore, for many use cases it is likely preferable to use the time->freq->wavelet and wavelet->freq->time versions of the transforms.

Although numba's just-in-time compilation must occur at runtime, all of the modules compile reasonably fast 
and compilation time is likely not a significant limiting factor for most applications