benchFFT: A Makefile repository from arpith20

----------------------------------------------------------------------------
			       WARNING

THIS BENCHMARK IS PROVIDED FOR YOUR INFORMATION ONLY.  DO NOT EXPECT
IT TO RUN OUT OF THE BOX.  WE DID NOT WRITE MOST OF THE INCLUDED
PROGRAMS.  IT IS NOT OUR FAULT IF NO TWO C++ COMPILERS ACCEPT THE SAME
LANGUAGE.  USE AT YOUR OWN RISK.

----------------------------------------------------------------------------

			       benchFFT

benchFFT is a program to benchmark FFT software, assembled by Matteo
Frigo and Steven G. Johnson at the Massachusetts Institute of
Technology.  You can contact us at benchfft@fftw.org, or visit the
benchFFT home page for benchmark results and other information:

		http://www.fftw.org/benchfft

The benchmark incorporates a large number of publicly available FFT
implementations, in both C and Fortran, and measures their performance
and accuracy over a range of transform sizes. It benchmarks both real
and complex transforms in one, two, and three dimensions.

The FFT implementations in the benchmark (found in the benchees/
subdirectory) were written by various authors over a period of more
than 35 years. Except for small tweaks to get things to compile, we
used the unmodified original codes.

----------------------------------------------------------------------------
			       CONTENTS

libbench/
	-- Timing, accuracy, utilities, and other routines used
	   when benchmarking and testing each FFT code.

benchees/
	-- The routines to be benchmarked.  Each directory contains
	   the original source code, any required patches, and source
	   for one or more "doit" programs that link with libbench and
	   call the FFT routine.  (See "architecture", below.)

scripts/
	-- Scripts to run the benchmark and to process/plot the
	   resulting data.  (See "running" and "plotting" below.)

----------------------------------------------------------------------------
			LICENSE AND COPYRIGHT

Files in the libbench/ directory are Copyright (c) 2003 Matteo Frigo,
Copyright (c) 2003 Massachusetts Institute of Technology, and
distributed under the terms of the GNU General Public License.  See
file COPYING for details.

FFT routines in the benchees/ directory are copyrighted by their
respective authors.  We are distributing only those routines that are
redistributable.  Please be advised that some routines may have
licensing terms that are incompatible with the GPL, in which case you
are not allowed to distribute the executable that results from linking
libbench/ with that FFT routine.  This is not a problem in practice,
because those executables are not meant to be distributed.

The remaining files (in particular, all programs called doit*.[ch],
and all scripts in the scripts/ directory) are hereby placed in the
public domain.

----------------------------------------------------------------------------
			       RUNNING

If you don't have a Unix system (GNU/Linux is fine), you are on your own.

To run the benchmark, you should pick some "good" compilers and
compiler flags and set the appropriate environment variables (e.g. CC
and CFLAGS), as well as setting the LDFLAGS and LD_LIBRARY_PATHS
variables as necessary in order to find any libraries installed on
your system (see below).  You then ./configure (with --enable-single
for single precision), run 'make -k' to build the benchmark and 'make
-k benchmark' to benchmark speed and/or 'make -k accuracy' to
benchmark accuracy.  (The '-k' skips any routines that fail to compile
for whatever reason.)  At the end, run 'make collect' to collect all
of the benchmark output into files `hostname`.speed,
`hostname`.accuracy, etcetera.

We generally put all of the commands to run the benchmark on a
particular machine into a little shell script.  See the scripts/run.*
files for some examples.  You would then run e.g. 'sh
scripts/run.solaris' to run the benchmark and collect the data.

		  Optional FFT libraries to install:

The benchmark does not include source code for all routines; in some
cases, you are expected to install the code separately onto your
machine.  The codes that you need to install separately, if you want
to benchmark them, are:

Free software:
	FFTW 2.x and/or FFTW 3.x (double and/or single precision): www.fftw.org
	GNU Scientific Library: sources.redhat.com/gsl

Hardware/vendor-specific:
	Intel Math Kernel Library: www.intel.com/software/products/mkl
	Intel IPPS
	AMD Core Math Library
	Apple VDSP (Macintosh G4 and higher only)
	IBM ESSL (AIX only)
	sgimath (SGI/MIPS only)
	SUNPERF (SPARC only)
	DXML/CXML (Alpha only)

Proprietary software:
	Numerical Recipes: copy .c and .f files into benchees/nr
	NAG (Numerical Algorithms Group) Fortran Library
	IMSL (International Mathematical and Statistical Library)

----------------------------------------------------------------------------
			       Plotting

The format and meaning of the collected speed and accuracy data are
described in www.fftw.org/speed/method.html and
www.fftw.org/accuracy/method.html.  We include a few useful scripts to
help you analyze and plot this data in the scripts/ directory.  (These
scripts mostly require you to have Perl installed.)

These scripts generally take either the `hostname`.speed or
`hostname`.accuracy, output by the benchmark above, as input.

Because these are plain text files, you can extract certain
information simply via grep; e.g. 'grep dcif' to extract data for
double-precision (d) complex-data (c) in-place (i) forward transforms
(f).  We supplement this with additional scripts like 'perl
grep-rank.pl 3' to extract rank-3 (or whatever) transforms, 'perl
grep-p2.pl' to grep for power-of-two sizes, and so on.

We also include a script, grace-plot.pl, to make plots of the speed
and accuracy data, just as they are plotted on our web pages.  This
script converts the speed and accuracy data into the file format of
the free Grace plotting software:

	http://plasma-gate.weizmann.ac.il/Grace/

So, for example, to plot the power-of-two 1d single-precision
complex-transform speed data to a foo.ps PostScript file, you would do:

	egrep 'sc[io][fb]' foo.speed | perl grep-p2.pl | perl grep-rank.pl 1
		| perl grace-plot.pl --no-dups
		| gracebat -pipe -printfile foo.ps

(gracebat is a batch-processing program that comes with Grace.)  The
--no-dups option selects only the "best" variant of each code (just
the best of forward/backward transforms, etcetera), as described in
www.fftw.org/speed/method.html.  For accuracy data, you should use the
--accuracy option to grace-plot.pl, and we generally also use the
--no-dups and --plot-worst options in this case.

To simplify matters somewhat, we have a standard-plots.sh script to
make a set of plots similar to the ones on our web page.  To use it,
do:

	sh standard-plots.sh foo.speed
	sh standard-plots.sh foo.accuracy

and the output will be a set of PostScript files (which, for the web,
we convert to .png via ImageMagick).

----------------------------------------------------------------------------
			     Architecture

For each routine that we benchmark, we compile a separate "doit"
program that links to libbench.  Not only does this isolate the
routines (which may be buggy) from one another, but it also provides a
compact record of the calling sequence and data formats that have been
used in FFTs over the years.

libbench provides the main program, command-line parsing, timing,
accuracy tests, etcetera, so the doit source only needs to provide the
following routines:

can_do(problem):
	Return whether or not the routine can solve a given FFT problem
	(size, precision, dimensionality, data format, etcetera).

setup(problem):
	Perform any one-time initializations needed for the FFT routine.

doit(numiters, problem):
	Call the FFT routine numiters time on the problem (which includes
	preallocated arrays, initialized to zero for speed benchmarks).

done(problem):
	Deallocate memory, etcetera.

In order to verify the correctness of an FFT routine and to benchmark
its accuracy, however, we also need to convert the input/output data
to/from a standard format: a row-major array of complex numbers.
(This is especially important for real-data transforms, which use
widely differing formats.)  To do this, the doit source code can
optionally also provide one or more of the following routines:

copy_c2c_from(problem, in):
	Copy complex data from input array 'in' to problem's
	complex-data arrays (defaults to a memcpy).

copy_c2c_from(problem, out):
	Copy complex data from problem's complex-data arrays to output
	array 'out' (defaults to a memcpy).

copy_c2r/r2c(problem, in/out)
	Copy real parts of complex data from/to array in/out to/from
	problem's real-data arrays (defaults to copying to/from a
	contiguous row-major array of real data).

copy_c2h/h2c(problem, in/out)
	Copy hermitian-symmetry complex data from/to array in/out to/from
	problem's arrays (e.g. as output by a transform's real-input
	FFT, which can use a variety of formats).

Some routines also return a normalized FFT output, whereas the
benchmark's accuracy tests expect unnormalized output, so some of the
doit programs also call the libbench unnormalize() routine to undo
this.  Note that NONE OF THESE DATA CONVERSIONS ARE PERFORMED DURING
SPEED BENCHMARKING.

The above documentation is not intended to be comprehensive, only to
give you a flavor of the overall structure.  See the many included
benchmark doit.c files (etc.) for examples, and also read the libbench
source code.
----------------------------------------------------------------------------
arpith20/benchFFT