tabemann/zeptoforth

Code that compiles in ram does not in Flash

bmentink opened this issue · 60 comments

Hi, I have a module that compiles to ram just fine, but it fails when compiling to flash with a stack underflow error, the code that fails is:

0 :noname     \ loop time overhead of task is ~47us
   begin
    bemf @ prev-bemf !      \ store current bemf 
    step @ neutral [@] @ 0 buffer [@] 1 buffer [@] + 2 buffer [@] + 3 / - bemf !
    \ Run the commutate State Machine,  first openloop, then closed
    commutate
    pause         \ Yield, but do this task as fast as we can 
  again ; 256 128 512 spawn motor-task !

Do I have to declare that differently for flash?

Thanks. So that code is within a module, as is another task, plus all the rest of the code.
What is the best way to set that up, do I have to pull those tasks out of the module and run initializer on them?
How then do I setup turnkey etc.

I can't find any info on initializer apart from your example. What does it do?

Here is what I have currenly ... a lot of code has been removed from the module to keep it short:

begin-module bldc
....
...
...

  \ Motor task -- we are in this task when running
  0 :noname     \ loop time overhead of task is ~47us
   begin
    bemf @ prev-bemf !      \ store current bemf 
    step @ neutral [@] @ 0 buffer [@] 1 buffer [@] + 2 buffer [@] + 3 / - bemf !
    \ Run the commutate State Machine,  first openloop, then closed
    commutate
    pause         \ Yield, but do this task as fast as we can 
  again ; 256 128 512 spawn motor-task !

  \ Speed task, will need to make this torque control, then later PID speed.
  0 :noname     
   begin
    \ set the duty cycle from buffered value
    3 buffer [@] s>f 3,5 f* f>s pwm-duty !      \ 3.5 is dependant on operating frequency
    10,0 pwm-duty @ s>f DEMAG_DELAY f/ d- 1,0 dmax f>s com_delay !   \ calc commutation delay based on pwm-duty 
    pwm-duty @ THROTTLE_THRESHOLD < if 0 state ! then                \ reset the state machine if we throttle right off
    60_000_000,0 120deg-time @ s>f 3,0 f* f/ 4,0 f/ f>s rpm !        \  calc rpm 4-pole motor
    40 ms         \ Update speed every 40 ms
  again ; 256 128 512 spawn speed-task !

  \ Do all the init stuff and run
  : init-bldc 
    init-array
    init-gpio
    init-pwm
    motor-task @ run
    speed-task @ run
  ;

  : off
    motor-task @ stop
    0 0 0 0 0 0 writePhases
  ;

 initializer init-bldc
    
end-module> import

Ok so with initailizer you don't need to include init like this:

: init init bldc::init-bldc ;

.. as described in the wiki.

So the tail end of my code above could look like?

  \ Do all the init stuff and run
  : init-bldc 
    init-array
    init-gpio
    init-pwm
    make-motor-task
    make-speed-task
    motor-task @ run
    speed-task @ run
  ;

  : off
    motor-task @ stop
    0 0 0 0 0 0 writePhases
  ;

  initializer init-bldc
    
end-module> import


: turnkey begin 1000 ms key? until ;

compile-to-ram

Is this correct? (I have created the two tasks as you specified)

Hi, yes all code is in flash by this point. So just to confirm, I don't need to call init in my above code as per the wiki description on turnkey? Or do I still have to add : init init ; in my code just below the end-module?

Thanks

Ok, so I presume that TURNKEY and INITIALIZER are two ways to get your software to boot from flash. I have tried both and nothing happens (no program runs and no REPL) I tried both the TURNKEY method and INITIALIZER ..

For TURNKEY I tried this after the module declaration and import:

: init init init-bldc ;

: turnkey begin 1000 ms key? until ;

init-bldc is the main entry point for my software .. nothing worked so had to put zeptoforth back on the rp2040 ..

I then tried just declaring initializer init-bldc by itself ... no trurnkey or init .. and that did not work either, same result.
Am I doing this correctly, or is there something in my code that compiles but does not run in flash ...?

Ok, since this will be hard to debug. Can you suggest what sort of things to look out for that don't play well with running from flash, but is fine for running from ram? I don't have any more :noname ; constructs now, all replaced with [: and ;], anything else that would trip me up ..

Thanks. By selectively isolating lines in init-bldc I have tracked it down to the following code, if I remove the call to this word then I can get into the REPL ok,

: make-speed-task ( -- )
  0 [:    
    begin
      \ set the duty cycle from buffered value
      3 buffer [@] s>f 3,5 f* f>s pwm-duty !      \ 3.5 is dependant on operating frequency
      10,0 pwm-duty @ s>f DEMAG_DELAY f/ d- 1,0 dmax f>s com_delay !   \ calc commutation delay based on pwm-duty 
      pwm-duty @ THROTTLE_THRESHOLD < if 
        0 state ! 
        0 0 0 0 0 0 writePhases 
      then  \ reset the state machine and stop the motor, if we throttle right off
      60_000_000,0 120deg-time @ s>f 3,0 f* f/ 4,0 f/ f>s rpm !        \  calc rpm 4-pole motor
      40 ms         \ Update speed every 40 ms
    again ;] 256 128 512 spawn speed-task !
  ;

The task before that was defined in exactly the same way, does it look ok? As I mentioned earlier, this is part of a module like all my code, and the module compiles to flash ok. The module contains all VARIABLES and CONSTANTS, buffers etc

The code above compiles ok. But when I include the line speed-task @ run
it fails ..... but not when in ram.

Ok thanks, I will rename state .. However, is it not local when defined in a module?

The other thing I notice that does not work when compiled to flash. I have the following array defining words:

\ Array defining words
  : [] ( size tib:"name" -- ) create dup , cells allot ;
  : [!] ( value index array -- ) swap cells + cell+ ! ;
  : [@] ( index array -- value ) cell+ swap cells + @ ;
  : [A] ( index array -- address ) cell+ swap cells + ;

defined a couple of arrays like this:

 5 [] buffer   
  6 [] neutral  

And they are used like this:

2 buffer [A]  0 neutral [!]

I am storing the address of 2 buffer in the array neutral at index 0. This works fine compiled to ram. Running from flash I get 0 neutral [@] returning -1 not the address of 2 buffer ..

Ahh, thanks. Would be helpful to know the `why' my words won't work in flash.

I tried buffer: as in say 10 buffer: buffer but when accessed from an interrupt routine (e.g: data index buffer ! ) crashes the cpu .... even running from ram.
What needs to be done to access it correctly .... I can't have much overhead in the interrupt routine as this is a real-time application.

The documentation states:

##### `buffer:`
( # "name" -- )

Specify a buffer of a given size

... but gives no clue to access, can it be used as an array?

Cheers, Bernie

Wow this is difficult! I expected to be able to do the same as FlashForth where CREATE DOES> works in flash.

You still have not told me how to create an array in flash.

If I do 3 cells buffer: buffer then I would expect to be able to do 10 2 buffer ! to store 10 to the 3rd element in the array. But it seems my created word buffer only takes a single argument ..... how do I use it? Example please.
(I tested in ram, but assume it is state smart)

Ok, got it going. I had to implement my array this way.

  5 cells buffer: buffer   
  6 cells buffer: neutral  \
  : [!] ( data index buffer_addr -- ) swap cells + ! ;
  : [@]  ( index buffer_adr -- ) swap cells + @ ;
  : [A]  ( index buffer_adr -- ) swap cells +  ;

Usage:
1234 3 buffer [!]
3 buffer [@]

That worked both in ram and flash.
It does put more overhead for my interrupt routine though, which means my top speed is now reduced ... or maybe just running out of flash is slower than ram. (I get 3000 rpm running in ram, 2500 rpm flash)

Ok, thanks, will have a look. I don't know if it will help in my case, the ADC module is rubbish on the rp2040, so actually thinking to re-port my application from rp2040 to Blackpill board (stm32f411)

It has much better ADC's. I was just looking to see what is involved and notice that you do not have a PWM driver for the Blackpill yet, do you intend writing one anytime? If not I will give it a go ..

Thanks. I have just built from source, I guess that will be only the base system, then add extras during setup_xxxx.fs ?

At a glance could not see the FAT32 stuff, is that in core?
I will attempt the PWM when I get a bit of time ..

By the way, what do I need to tweak to get 921,600 baud for the F411 target. My USB->Uart can handle that ... will the F411? ... or am I best to use swdcom. Is there a way to use that with zeptocom.js?

Cheers

Hi Travis,

As mentioned in the sourceforge forum, I might persevere a bit longer with the rp2040 adc's, with moving some code onto the 2nd core. With that in mind, what is the best(fastest) way to share variables between cores? Also, can you point me to some guides for running on the 2nd core .. thanks.

Bernie

Brilliant Thanks. :)

Hi Travis,

Regarding shared variables and critical-sections. If I have the following interrupt routine running on core0:

: pwm_wrap_int ( -- ) 
    \ high test_pin pin! 
    adc-mux @  mux 1 adc@@ adc-mux @ buffer [!]              \ save adc1 values in buffer array 
    1 adc-mux +!
    adc-mux @ 3 > if 2 adc@@ 3 buffer [!]  0 adc-mux !  then             \ save throttle
    \ low test_pin pin! 
    %00000001 clear-pwm-int    \ Clear interrupt
  ;

The only variable I want to access is the buffer array, is this correct for access on core1?

: make-motor-task ( -- )
  0 [:    
    \ 0 current-task task-priority!
    begin
      bemf @ prev-bemf !      \ store current bemf 
    BEGIN-CRITICAL
      step @ neutral [@] @ 0 buffer [@] 1 buffer [@] + 2 buffer [@] + 3 / - bemf !
    END-CRITICAL
      \ Run the commutate State Machine,  first openloop, then closed
      commutate
      \ pause         \ Yield, but do this task as fast as we can 
    again ;] 256 128 512 1 spawn-on-core motor-task !
  ;

I tried this and it does not work at all ..

Thanks. But unfortunately failed to even run the motor. After some debugging, found it failed at the point the pwm interrupt was enabled ..

At that point cpu was hung, not even REPL was working, had to do a physical reset of the board.
Yes, the interrupt is only run on core 0 ..

Same fault ...

I don't see where sync get's set to true ..?

Ok, thanks. Regarding your earlier response, I do have %00000001 clear-pwm-int \ Clear interrupt at the end of my interrupt as above, should I be doing something else?

No, the interrupt handler take about 6..7us, the pwm is 20khz (50us) ... works perfectly fine before these mods you suggested. I will try to measure with the mods ... if I can get it to run ... ;)

EDIT: I removed the critical section from the task side, and the motor runs .. sort of .. but that allows me to see the time it takes in the interrupt routine. I measured 11us worse case and 6us best case ..

When I add the task critical section code back in, it does not run at all ... strange ..

Hi Travis,

I have gone back to single core (0) and have disabled multitasking with BEGIN-CRITICAL and everything is working much better. I will do the tried and true method of counter based delays ,to do the less critical things which I had originally left for tasks.

I have managed to get to the rpm limit for this motor, so can't test further until I get a higher KV motor. Will order one soon.
Thanks again for all your valuable help.

Bernie

I was going to say that if it doesn't work for you to go back to one core,
but I would have suggested doing everything on the second core, so you can
have BEGIN-CRITICAL on permanently without interfering with things like the
USB CDC console.

I am running nearly everything in the interrupt handler now, how do I run that on the 2nd core?

Ok. So there is no forth words defined for running interrupts on the 2nd core?

I did notice that in the pwm driver you do enable pwm on the 2nd core .. I had a look in the test folder, but did not see any pwm example of running on the 2nd core, or interrupts on the 2nd core. Are these still to be done at some stage?

If so, I will do as you say and do direct register access ..

I note that the PWM_IRQ_WRAP interrupt is routed to the second core, as well, so nothing to be done there, I have not found out how to enable that interrupt on the 2nd core.

I found PROC1_NMI_MASK and have defined that, but cannot see where it's bits are defined to enable PWM_IRQ_WRAP ..

No Problem ... thanks, will stay with the core 0 for now ..

Thanks ..

Hi Again Travis,

I have used the following word to set/clr bits of a port, but it is not very efficient as I am setting/clearing each bit. Is there a better Forth word that I can use that directly masks the GPIO registers?

 : mux ( sel -- )
    case 
      0 of [ mux0_pin bit ] literal GPIO_OUT_CLR ! [ mux1_pin bit ] literal GPIO_OUT_CLR ! endof
      1 of [ mux0_pin bit ] literal GPIO_OUT_SET ! [ mux1_pin bit ] literal GPIO_OUT_CLR ! endof
      2 of [ mux0_pin bit ] literal GPIO_OUT_CLR ! [ mux1_pin bit ] literal GPIO_OUT_SET ! endof
      3 of [ mux0_pin bit ] literal GPIO_OUT_SET ! [ mux1_pin bit ] literal GPIO_OUT_SET ! endof
    endcase
  ;

This word uses a 2-bit selector to select an ADC mux. mux0_pin is say bit6, mux1_pin bit7.
Is there a faster way to do this?

The only real improvement I can suggest is:

: do-mux-0 ( -- ) [ mux0_pin bit mux1_pin bit or ] literal GPIO_OUT_CLR ! ;
: do-mux-1 ( -- ) [ mux0_pin bit ] literal GPIO_OUT_SET ! [ mux1_pin bit ] literal GPIO_OUT_CLR ! ;
: do-mux-2 ( -- ) [ mux0_pin bit ] literal GPIO_OUT_CLR ! [ mux1_pin bit ] literal GPIO_OUT_SET ! ;
: do-mux-3 ( -- ) [ mux0_pin bit mux1_pin bit or ] literal GPIO_OUT_SET ! ;
create mux-table ' do-mux-0 , ' do-mux-1 , ' do-mux-2 , ' do-mux-3 ,
: mux ( u -- )
  dup 4 u< if cells mux-table + @ execute else drop then
;

This gets around some of the inefficiencies of the implementation of case ... endcase in zeptoforth.

Thanks, will time that compared to my mux ....
That Set/Clear ARM model has always bugged me ... oh for the days of read-modify-write CPU's ..

Hmm, I don't think your example will work as execute can't be used in an interrupt, correct? It crashes the CPU ..

I did the following:

: mux
    mux0_pin lshift
    dup 
    GPIO_OUT_SET !
    $3  mux0_pin lshift
    xor GPIO_OUT_CLR !
  ;

It ended up being twice as fast as the case statement version (800ns versus 1.6us)
Of course, the above example assumes mux1_pin is mux0_pin +1. :)

Thanks, will time that compared to my mux .... That Set/Clear ARM model has always bugged me ... oh for the days of read-modify-write CPU's ..

Hmm, I don't think your example will work as execute can't be used in an interrupt, correct? It crashes the CPU ..

Oh there was just a very stupid bug in my example code - I had accidentally forgotten the ,s when setting up the jump table. It should work now.

Thanks. Will time it compared to my latest ..
EDIT: Your version takes 1us, slightly slower than my pretend read-modify-write ..

Yeah, from second thought your latest version is obviously better assuming there is no need for validation. (My code can be made faster through removing the validation but it still will be slower.)

Yeah, from second thought your latest version is obviously better assuming there is no need for validation. (My code can be made faster through removing the validation but it still will be slower.)

Yep, the time I gave above was for a version of your code with the validation removed.

On another issue: Do you have any documentation for the PIO forth code. I am having trouble understanding it compared to the asm version ... if not, the examples probably need more comments on each instruction to be clear .. thanks.

Are there any particular improvements you would suggest upon the PIO documentation?

No, the documentation is fine. Just a few comments in the examples would be good, otherwise you have to continuously refer to the documentation to work out what each line of code does .. especially confusing is the jmp labels, still have not worked that out ..