Compilation for other Processors

Question

Compilation for other Processors

Closed this issue 10 years ago · 16 comments

Hi,
This is not really an issue, sorry, but I am trying to use SYLT-FFT on a Pebble smart watch, which I think is using a ARM Cortex M3 processor.
I got a lot of errors about the assembler code in SYLT-FFT, but I managed to force it to compile by adding "#undef ARMCC_VERSION" and "#undef __arm" before including fft.h.
It now compiles and does something (I need to to do some testing to see what it is doing!), but I wondered if you thought this was the right way to tell it to use C code rather than the assembler optimisations?
Also, main.c used a header file MK20D5.h - I don't know what that is, and have not managed to compile your main.c, so wondered if that was something to do with processor specific things?
(If you are interested, I am using it in an experimental epileptic seizure detector - http://openseizuredetector.org.uk, and https://github.com/OpenSeizureDetector

Thanks!

Answer 1 · 2015-01-19T09:01:55.000Z

This code was designed for Cortex-M4 and you should get errors trying to compile on M3. The fallback snippets in C are heaps slower than the intrinsics used for M4.

Also, as an important side note - this is written for compilation by GCC ARM Embedded in Launchpad and Keil C Compiler. It has not been tested or optimized for any other compiler (and optimizing for a specific compiler turned out some pretty drastic performance improvements).

You should at the very least leave in the intrinsics that are available on your platform and attempt to rewrite the missing intrinsics to use as few assembly instructions as possible unless the compiler is already optimizing the equivalent C-code to optimal assembly. It would be trivial to add preprocessor instructions to separate M3 intrinsics from M4 equivalents.

The C-code snippets you are falling back to now were designed for functionality testing on a PC and were never intended for an embedded environment. In the end I would expect a significant performance hit on the M3 as fast intrinsics are used at the core butterfly operations where even a single extra cpu cycle has quite dramatic effects.

Please do let me know how you fare though, if you ever compare against anything else.

Answer 2 · 2015-01-19T09:06:12.000Z

Oh, and main.c was, as specified in the first line, is written for benchmarking on the FRDM-K20D50M board. If you wish to benchmark on another platform, it needs to be rewritten, but this file has nothing to do with the fft library, really. It's just a trivial example on how to use it, as well as a benchmarking timer for measuring the CPU cycle count of calls to various functions.

The MK20D5.h file is a header specific to the microcontroller used on this development board.

And yes I am interested and will absolutely check out your project :)

Answer 3 · 2015-01-19T19:07:15.000Z

Thanks - I know what you mean about the assembler bits, but my first objective is to get something that runs and gives sensible results - performance improvements will have to come next, which is why dropping back to C will do me for now! I was just pleased to find a library that didn't rely on libm.

Regards

Graham.

Answer 4 · 2015-01-20T09:06:50.000Z

I understand and sincerely hope this will prove useful for your project.
If there is anything I can do to help with usage/implementation, anything that needs explaining, etc. do not hesitate to ask!
Feel free to use email as the forum if you wish - I will likely reply faster there.

Answer 5 · 2015-01-20T21:22:24.000Z

Thank you - I may well have to ask for help - I have never done much
practical digital signal processing (just a university course 30 years
ago....)

I have tried to write up the design of the sampling and analysis system -
sample rate, sample collection time, and what I think I should detect - here
https://github.com/jones139/OpenSeizureDetector/wiki/Pebble-Version. I
would appreciate any thoughts you have on that in case you can see
something I have missed.

A colleague has got me thinking about using a digital band-pass filter as
an alternative to conversion to frequency domain which I might think about
too, but haven't thought about how to make one - I need the digital
equivalent of a couple of R-C circuits.....

Regards

Graham.

On 20 January 2015 at 09:06, Thor Grimner notifications@github.com wrote:

I understand and sincerely hope this will prove useful for your project.
If there is anything I can do to help with usage/implementation, anything
that needs explaining, etc. do not hesitate to ask!
Feel free to use email as the forum if you wish - I will likely reply
faster there.

—
Reply to this email directly or view it on GitHub
#1 (comment).

Graham Jones
Hartlepool, UK.

Answer 6 · 2015-01-20T22:37:55.000Z

Could you confirm whether I am using the library correctly or not please?
What I am doing is:

Collecting a time series of data.
Putting it into a fft_complex_t array - my data is going in the .r real
component, and setting the .i imaginary component to zero.

Call fft_forward, passing the complex array as a parameter, plus a 'bits'
parameter.

I then expect the result to be in the same fft_complex_t array as used for
the input data - the .r component is the amplitude, and the .i component
the phase (which I am ignoring for my application).

I have not been able to work out what the 'bits' parameter is - my values
are all integers, so does this mean that bits is zero?

Thanks!

Graham.

On 20 January 2015 at 21:22, Graham Jones grahamjones139@gmail.com wrote:

Thank you - I may well have to ask for help - I have never done much
practical digital signal processing (just a university course 30 years
ago....)

I have tried to write up the design of the sampling and analysis system -
sample rate, sample collection time, and what I think I should detect -
here https://github.com/jones139/OpenSeizureDetector/wiki/Pebble-Version.
I would appreciate any thoughts you have on that in case you can see
something I have missed.

A colleague has got me thinking about using a digital band-pass filter as
an alternative to conversion to frequency domain which I might think about
too, but haven't thought about how to make one - I need the digital
equivalent of a couple of R-C circuits.....

Regards

Graham.

On 20 January 2015 at 09:06, Thor Grimner notifications@github.com
wrote:

I understand and sincerely hope this will prove useful for your project.
If there is anything I can do to help with usage/implementation, anything
that needs explaining, etc. do not hesitate to ask!
Feel free to use email as the forum if you wish - I will likely reply
faster there.

—
Reply to this email directly or view it on GitHub
#1 (comment).

Graham Jones
Hartlepool, UK.

Graham Jones
Hartlepool, UK.

Answer 7 · 2015-01-21T08:07:55.000Z

I have never done much practical digital signal processing

Tell me about it, I had never done DSP until finding myself in a situation needing to write this fft library that had to not only be faster, but smaller than the industry standard CMSIS DSP implementation. Those were some pretty intense weeks, given I've never studied math beyond grade school :P

Could you confirm whether I am using the library correctly or not please?

If you could link me to the file containing the source, that would be helpful.

Putting it into a fft_complex_t array - my data is going...

Yup, that is pretty close!

Put your data in the .r component of your fft_complex_t array (which needs to be sized to a power of 2), zero out the .i component.
Call fft_fft on the array (not fft_forward).
The bits parameter actually specifies the size of your array in the form size = 1 << bits.
So bits:1=size:2, bits:2=size:4, bits:3=size:8, bits:4=size:16 and so forth.
The maximum fft size is configured in config.h, SINE_BITS (requires regenerating the sine table if changed).

== following is common to all fft implementations ==
The output of any FFT however is never amplitude and phase (unless the library specifically includes such conversion, uncommon), rather a vector in the imaginary plane.
To calculate amplitude and phase the vector must be converted into this form.
For amplitude, use the Pythagorean theorem: amplitude = sqrt(r_r + i_i)
For phase, you could use atan2 if available: atan2(r, i)
Note that the lower half of the output will contain the positive frequency components, and the upper half will contain the negative components, which for real data (no .i input) will be a mirror of the positive components. Also note that the first bin (array[0]) is the DC component (frequency is 0).
== end of common ==

Once you get this working, there are ways to improve performance.
Since you are only using real input data (sampled signal into .r component with .i always zero) you are most likely not interested in the negative components and you will simply discard them.
I've included a function called fft_fftr that will be alot faster in your case.
Instead of putting data in .r with .i=0, put data into both of them interlaced.
Instead of:
cplx[0].r = input[0]; cplx[0].i = 0;
cplx[1].r = input[1]; cplx[1].i = 0;
cplx[2].r = input[2]; cplx[2].i = 0;
...etc
Do:
cplx[0].r = input[0]; cplx[0].i = input[1];
cplx[1].r = input[2]; cplx[1].i = input[3];
...etc
and then simply call fft_fftr instead.
Note that the complex array should now be declared to half the size of the input array as you will be fitting two values into each bin.

The output will be the same, except no mirrored negative frequencies and using significantly less computation time.

Again, if you run into problems - send me the source and I will have a look at it.

Answer 8 · 2015-01-21T09:39:33.000Z

A colleague has got me thinking about using a digital band-pass

Sure, this is a possibility, and it may be faster if you are only looking to check a few well defined frequency bands.

FIR filters are quite simple to design and can indeed be very fast, depending on your specifications.

However, if you are interested in the frequency content in detail, FFT is the best way to go.
Even if you are not, it may still be the most straight forward, readable, easy to understand and dynamic method as dynamic/realtime recalculation of the filter coefficients can get a bit messy even though the math is fairly basic. FFT may also be a good tool to determine the proper specifications for the digital filter if you decide this is the best way to go in the end.

See this site for examples on various filter implementations & more: http://musicdsp.org/archive.php?classid=3
You might find some food for though, useful implementations and examples there.

Answer 9 · 2015-01-25T22:38:52.000Z

Thanks for all your help. I think I am making progress using this on a
pebble smart watch - code at
https://github.com/jones139/OpenSeizureDetector/blob/master/pebble_version/pebble_sd/src/pebble_sd.c
.

I have used absolute values of acceleration (ie it does not go negative),
so I think I am calculating about double the actual frequency, which I can
correct for, and have calculated the absolute value of the complex fft
output, but am getting sensible looking values - need to do some proper
testing next week.

Cheers

Graham.

On 21 January 2015 at 09:39, Thor Grimner notifications@github.com wrote:

A colleague has got me thinking about using a digital band-pass

Sure, this is a possibility, and it may be faster if you are only looking
to check a few well defined frequency bands.

FIR filters are quite simple to design and can indeed be very fast,
depending on your specifications.

However, if you are interested in the frequency content in detail, FFT is
the best way to go.
Even if you are not, it may still be the most straight forward, readable,
easy to understand and dynamic method as dynamic/realtime recalculation of
the filter coefficients can get a bit messy even though the math is fairly
basic. FFT may also be a good tool to determine the proper specifications
for the digital filter if you decide this is the best way to go in the end.

See this site for examples on various filter implementations & more:
http://musicdsp.org/archive.php?classid=3
You might find some food for though, useful implementations and examples
there.

—
Reply to this email directly or view it on GitHub
#1 (comment).

Graham Jones
Hartlepool, UK.

Answer 10 · 2015-01-26T07:40:06.000Z

Hey again,

This looks good at first glance, might find some time today to test it out and make sure.

Using your absolute values is not a problem - this is simply a signal with a DC offset.
It will boil down into the DC bin (bin 0) during FFT, which on completion will contain a larger value.
Other bins are unaffected by input offset, and there is no shift in frequency, so you need not account for this offset anywhere.

Answer 11 · 2015-01-26T09:14:21.000Z

I did find the time and your code does work. Still, I came up with the following code to demonstrate the faster version using real data only (test_fast). This is based on your code and should hopefully be a comfortable read.

See also the sqrt part of the magnitude calculation. Without it scaling of magnitude will depend on phase, giving larger than actual values as phases that approach each of the four diagonal vectors 45, 135, 225 and 315 and will only produce accurate results at exact 90 degree multiples.

fft_t get_magnitude(fft_complex_t *c) {
  return sqrt(c->r * c->r + c->i * c->i);
}

double get_phase(fft_complex_t *c) {
  return atan2(c->r, c->i);
}

void test() {
  int i;

  // Simulate input
  unsigned harmonic = 10; // harmonic number
  for(i = 0; i < NSAMP; i++) {
    accData[i] = sin((double)i * (M_PI * 2 * harmonic / NSAMP)) * 100;
  }

  // Do FFT
  for (i = 0; i < NSAMP; i++) {
    fftdata[i].r = accData[i];
    fftdata[i].i = 0;
  }
  fft_fft(fftdata, FFT_BITS);
  // ffdata[0]                        = DC component
  // fftdata[1 to NSAMP/2 - 1)        = FFT bins of interest
  // fftdata[NSAMP/2 to end-of-array] = Mirror version (discard)
  int32_t maxVal = fftdata[1].r;
  unsigned maxLoc = 1;
  for(i = 1; i < NSAMP / 2; i++) {
    fftdata[i].r = get_magnitude(&fftdata[i]);
    if (fftdata[i].r>maxVal) {
      maxVal = fftdata[i].r;
      maxLoc = i;
    }
  }

  // Draw bins of interest
  for (i = 1; i < NSAMP / 2; i++) {
    draw_line(i, 256, i, 255 - fftdata[i].r);
  }  
}

void test_fast() {
  int i;
  // Simulate input
  unsigned harmonic = 10; // harmonic number
  for(i = 0; i < NSAMP; i++) {
    accData[i] = sin((double)i * (M_PI * 2 * harmonic / NSAMP)) * 100;
  }
  // Do FFT
  for (i = 0; i < NSAMP / 2; i++) {
    fftdata[i].r = accData[i * 2 + 0];
    fftdata[i].i = accData[i * 2 + 1];
  }
  // if accData is changed to fft_t you can do:
  // memcpy(fftdata, accData, sizeof(fftdata)); // faster copy
  // or do fft_fftr directly on accData even
  fft_fftr(fftdata, FFT_BITS - 1); // Note !! fftR !! bits - 1 !!
  int32_t maxVal = fftdata[1].r;
  unsigned maxLoc = 1;
  for(i = 1; i < NSAMP / 2; i++) {
    fftdata[i].r = get_magnitude(&fftdata[i]);
    if (fftdata[i].r>maxVal) {
      maxVal = fftdata[i].r;
      maxLoc = i;
    }
  }

  // Draw bins of interest
  for (i = 1; i < NSAMP / 2; i++) {
    draw_line(i, 512, i, 511 - fftdata[i].r);
  }  
}

Answer 12 · 2015-01-26T18:19:06.000Z

Thanks for taking the time to look at that - I'll look at your suggested
optimisations - I'll have to think about assigning a real and imaginary
component to the input data - haven't quite worked out what that is doing
yet!

I was just checking the Credits in my README file, and I have the author of
SYLT-FFT as D. Taylor - is that you or someone else (just wondered!).

Cheers

Graham.

On 26 January 2015 at 09:14, Thor Grimner notifications@github.com wrote:

Did find the time and it does work and I came up with the following code
to demonstrate the faster version using real data only (test_fast). This is
based on your code and should hopefully be a comfortable read.

fft_t get_magnitude(fft_complex_t *c) {
return sqrt(c->r * c->r + c->i * c->i);
}

double get_phase(fft_complex_t *c) {
return atan2(c->r, c->i);
}

void test() {
int i;

// Simulate input
unsigned harmonic = 10; // harmonic number
for(i = 0; i < NSAMP; i++) {
accData[i] = sin((double)i * (M_PI * 2 * harmonic / NSAMP)) * 100;
}

// Do FFT
for (i = 0; i < NSAMP; i++) {
fftdata[i].r = accData[i];
fftdata[i].i = 0;
}
fft_fft(fftdata, FFT_BITS);
// ffdata[0] = DC component
// fftdata[1 to NSAMP/2 - 1) = FFT bins of interest
// fftdata[NSAMP/2 to end-of-array] = Mirror version (discard)
int32_t maxVal = fftdata[1].r;
unsigned maxLoc = 1;
for(i = 1; i < NSAMP / 2; i++) {
fftdata[i].r = get_magnitude(&fftdata[i]);
if (fftdata[i].r>maxVal) {
maxVal = fftdata[i].r;
maxLoc = i;
}
}

// Draw bins of interest
for (i = 1; i < NSAMP / 2; i++) {
draw_line(i, 256, i, 255 - fftdata[i].r);
}

}

void test_fast() {
int i;
// Simulate input
unsigned harmonic = 10; // harmonic number
for(i = 0; i < NSAMP; i++) {
accData[i] = sin((double)i * (M_PI * 2 * harmonic / NSAMP)) * 100;
}
// Do FFT
for (i = 0; i < NSAMP / 2; i++) {
fftdata[i].r = accData[i * 2 + 0];
fftdata[i].i = accData[i * 2 + 1];
}
// if accData is changed to fft_t you can do:
// memcpy(fftdata, accData, sizeof(fftdata)); // faster copy
// or do fft_fftr directly on accData even
fft_fftr(fftdata, FFT_BITS - 1); // Note !! fftR !! bits - 1 !!
int32_t maxVal = fftdata[1].r;
unsigned maxLoc = 1;
for(i = 1; i < NSAMP / 2; i++) {
fftdata[i].r = get_magnitude(&fftdata[i]);
if (fftdata[i].r>maxVal) {
maxVal = fftdata[i].r;
maxLoc = i;
}
}

// Draw bins of interest
for (i = 1; i < NSAMP / 2; i++) {
draw_line(i, 512, i, 511 - fftdata[i].r);
}

}

—
Reply to this email directly or view it on GitHub
#1 (comment).

Graham Jones
Hartlepool, UK.

Answer 13 · 2015-01-26T19:51:26.000Z

Sure! Let me know if there is anything I can try and help explain.
Yes, that is I - Intermittent paranoia keeps me using the odd pseudonym here and there ;)

Answer 14 · 2015-03-11T20:52:43.000Z

Hi There,
Thanks for all your help getting FFT running in my pebble app - I have got
a decent prototype working now, my son is wearing it and I have the alarm
infrastructure working enough for my phone to go 'beep' if the FFT analysis
suggest he may be having a seizure - some details at
http://openseizuredetector.org.uk if you are interested.
Thanks again - I needed your FFT library to be able to make this work!

Cheers

Graham.

On 26 January 2015 at 19:51, Thor Grimner notifications@github.com wrote:

Sure! Let me know if there is anything I can try and help explain.
Yes, that is I - Intermittent paranoia keeps me using the odd pseudonym
here and there ;)

—
Reply to this email directly or view it on GitHub
#1 (comment).

Graham Jones
Hartlepool, UK.

Answer 15 · 2015-03-13T08:56:57.000Z

Thank you for writing me, this is exactly the sort of thing that makes one
happy after releasing free code.
Your project seems very useful and important and even though my
contribution is tiny, it gives me great satisfaction to have my code be
part of it.
I am also impressed by the broad spectrum of solutions, languages and
platforms you have tried.

Read through your source today (which seems very nicely written by the
way), and while I am sure performance or memory is not much of a concern in
this case you can speed up the FFT calculation significantly while also
reducing memory requirements:

for(i = 0; i < NSAMP / 2; i++) {
fftdata[i].r = accData[i * 2 + 0];
fftdata[i].i = accData[i * 2 + 1];
}
fft_fftr(fftdata, FFT_BITS - 1);

You are working with REAL-only data (.i is always 0) and this allows for
optimizing an FFT calculation in such a way that you are able to process
your REAL data using an FFT of half the size in the COMPLEX domain,
reducing computation time to less than half.
This is offset by the need to perform some conversions after the FFT, but
there is still a good net gain.

If you have no need for accData after converting, you can actually do the
real FFT version directly on accData given that it's data format is changed
to int32:

fft_fftr((fft_complex_t *)accData, FFT_BITS - 1);

After which you could still do:

maxVal = getMagnitude(((fft_complex_t *)accData)[1]);

This would save you even more memory and processing time.

This is all probably of little interest to you, but I figured you should
have the information, even if for future reference only.

What may be of greater interest to you though, is windowing of the FFT.
I believe your output will be more useful if you implement a suitable
windowing method.
Currently you are technically using a rectangular window which is rarely
optimal for any kind of continuous signal.
If you are interested, National Instruments has a nice introductory
document at http://www.ni.com/white-paper/4844/en/

Best regards,
Davey

2015-03-11 21:52 GMT+01:00 Graham Jones notifications@github.com:

Hi There,
Thanks for all your help getting FFT running in my pebble app - I have got
a decent prototype working now, my son is wearing it and I have the alarm
infrastructure working enough for my phone to go 'beep' if the FFT analysis
suggest he may be having a seizure - some details at
http://openseizuredetector.org.uk if you are interested.
Thanks again - I needed your FFT library to be able to make this work!

Cheers

Graham.

On 26 January 2015 at 19:51, Thor Grimner notifications@github.com
wrote:

Sure! Let me know if there is anything I can try and help explain.
Yes, that is I - Intermittent paranoia keeps me using the odd pseudonym
here and there ;)

—
Reply to this email directly or view it on GitHub
#1 (comment).

Graham Jones
Hartlepool, UK.

—
Reply to this email directly or view it on GitHub
#1 (comment).

By reading this e-mail, you agree to not notice any spelling or grammatical
errors that may or may not be present. Further, you agree to, unless
otherwise specified, disclose the contents of this e-mail to anyone who may
have an interest in the information. If you do not accept these terms you
are obliged to delete this e-mail and any copies thereof before even
reading this paragraph.

Answer 16 · 2015-03-14T19:22:32.000Z

Thank you for the optimisation suggestions - I am going to start on an
update to the watch application to make some other changes, so I will
incorporate your suggestions at the same time - anything to reduce
processor load will help battery life, so it is very useful. I haven't
come across windowing in the context of FFT - I will certainly look at the
page you suggest.

Thank you again for all your help - you are of course wrong to say that
your contribution to my project is small - it would not work without a FFT
calculation, so this library is fundamental to making it work!

Graham.

On 13 March 2015 at 08:56, D Taylor notifications@github.com wrote:

Thank you for writing me, this is exactly the sort of thing that makes one
happy after releasing free code.
Your project seems very useful and important and even though my
contribution is tiny, it gives me great satisfaction to have my code be
part of it.
I am also impressed by the broad spectrum of solutions, languages and
platforms you have tried.

Read through your source today (which seems very nicely written by the
way), and while I am sure performance or memory is not much of a concern in
this case you can speed up the FFT calculation significantly while also
reducing memory requirements:

for(i = 0; i < NSAMP / 2; i++) {
fftdata[i].r = accData[i * 2 + 0];
fftdata[i].i = accData[i * 2 + 1];
}
fft_fftr(fftdata, FFT_BITS - 1);

You are working with REAL-only data (.i is always 0) and this allows for
optimizing an FFT calculation in such a way that you are able to process
your REAL data using an FFT of half the size in the COMPLEX domain,
reducing computation time to less than half.
This is offset by the need to perform some conversions after the FFT, but
there is still a good net gain.

If you have no need for accData after converting, you can actually do the
real FFT version directly on accData given that it's data format is changed
to int32:

fft_fftr((fft_complex_t *)accData, FFT_BITS - 1);

After which you could still do:

maxVal = getMagnitude(((fft_complex_t *)accData)[1]);

This would save you even more memory and processing time.

This is all probably of little interest to you, but I figured you should
have the information, even if for future reference only.

What may be of greater interest to you though, is windowing of the FFT.
I believe your output will be more useful if you implement a suitable
windowing method.
Currently you are technically using a rectangular window which is rarely
optimal for any kind of continuous signal.
If you are interested, National Instruments has a nice introductory
document at http://www.ni.com/white-paper/4844/en/

Best regards,
Davey

2015-03-11 21:52 GMT+01:00 Graham Jones notifications@github.com:

Hi There,
Thanks for all your help getting FFT running in my pebble app - I have
got
a decent prototype working now, my son is wearing it and I have the alarm
infrastructure working enough for my phone to go 'beep' if the FFT
analysis
suggest he may be having a seizure - some details at
http://openseizuredetector.org.uk if you are interested.
Thanks again - I needed your FFT library to be able to make this work!

Cheers

Graham.

On 26 January 2015 at 19:51, Thor Grimner notifications@github.com
wrote:

Sure! Let me know if there is anything I can try and help explain.
Yes, that is I - Intermittent paranoia keeps me using the odd pseudonym
here and there ;)

—
Reply to this email directly or view it on GitHub
#1 (comment).

Graham Jones
Hartlepool, UK.

—
Reply to this email directly or view it on GitHub
#1 (comment).

By reading this e-mail, you agree to not notice any spelling or grammatical
errors that may or may not be present. Further, you agree to, unless
otherwise specified, disclose the contents of this e-mail to anyone who may
have an interest in the information. If you do not accept these terms you
are obliged to delete this e-mail and any copies thereof before even
reading this paragraph.

—
Reply to this email directly or view it on GitHub
#1 (comment).

Graham Jones http://google.com/+GrahamJones
Hartlepool, UK.