TG9541/stm8ef

Erratic clock?

VK6TT opened this issue · 3 comments

I would never have noticed this if I hadn't slowed the CPU clock down to 15.625kHz and tried to do everything one clock cycle at a time. Trying to save those last few erg's of battery life has cost me hours of frustration!

The symptom, shown on the following picture, is that at times something might be consuming CPU cycles behind the scenes or the clock skips forwards or backwards.

This picture shows the influence of a delay, denoted by the shaded region, in toggling the pin.
1of5

A pin is set high or low with:

: _TXon  [ 1 PC_ODR _TxEn ]B! NOP ; \ 72 1E 50  A 9D
: _TXoff [ 0 PC_ODR _TxEn ]B! NOP ; \ 72 1F 50  A 9D

which compiles correctly as shown in the comments above. The complete byte "pattern" is also correct:

00111 00111 001 01 001 01 001 01 001 01 0

95CC 72 1F 50 A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72
95DC 1E 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 1F
95EC 50 A 9D 72 1E 50 A 9D 72 1E 50 A 9D 72 1E 50
95FC A 9D 72 1F 50 A 9D 72 1F 50 A 9D 72 1E 50 A
960C 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D
961C 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72
962C 1E 50 A 9D 72 1F 50 A 9D 72 1F 50 A 9D 72 1E
963C 50 A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50
964C A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A
965C 9D 72 1E 50 A 9D 72 1F 50 A 9D 81

So there is no extra op-codes that could explain why the pin toggled one cpu cycle later than it should have.

It's not related to any timer because I have turned all of those off after disabling them. Ditto every other peripheral. My current minimising code for the STM8003 is:

: MIN_I \ prepare for mimimum current draw
\ on reset these are off but included for reference if needed   
   [ 0 SPI_CR1 6 ]B! \ SPI off
   [ 0 I2C_CR1 0 ]B! \ I2C off
   [ 0 UART1_CR2 3 ]B! \ UART tX off
   [ 0 UART1_CR2 2 ]B! \ UART RX off
   [ 0 ADC_CR1 0 ]B! \ ADC off
   [ 0 TIM1_CR1 0 ]B! \ Tim1 off
   [ 0 TIM2_CR1 0 ]B! \ Tim2 off
   [ 0 TIM4_CR1 0 ]B! \ Tim4 off
   
   [ 1 CLK_ICKR 5 ]B! \ MVR regulator OFF in Active-halt mode
   [ 1 FLASH_CR1 2 ]B! \ Flash powered down in aCtive Halt mode
   [ 0 CLK_PCKENR1 ]C! \ No clock to peripherals
\   [ 0 CLK_PCKENR2 7 ]B! \ No clock to CAN - unused on STM8003
   [ 0 CLK_PCKENR2 3 ]B! \ No clock to ADC
\   [ 0 CLK_PCKENR2 2 ]B! \ No clock to AWU

\ following made need adjustment for connected hardware !!!   
   
   [ $FF PA_DDR ]C! \ all ports made outputs 
   [ $FF PB_DDR ]C! 
   [ $FF PC_DDR ]C!
   [ $FF PD_DDR ]C!
   [ $0 PA_ODR ]C! \ all ports made outputs low level
   [ $0 PB_ODR ]C! 
   [ $0 PC_ODR ]C!
   [ $0 PD_ODR ]C!
;   

And just to be sure, and after removing any terminal input/output routines at the start of my main word, I effectively bypassed Forth and on reset jump directly to the start of my code. So that rules out background, idle and any other Forth influence. This problem still exists even running the pure assembly code.

The pattern shown above was captured as the first of 5 bytes. I was sending 3 on, 3 off codes on a repeating basis. Fortuitously I captured a ON, OFF, OFF, OFF, ON sequence. The two ON sequences were identical in all respects including the problem.

The three OFF bytes were also wrong, but identical. The first start pulse was missing the leading high period. And two of the one bits had an extra high period.

If the CPU had been running faster I would probably not have noticed this because the code would have changed the oin state and then the additional time period needed to get the pulse length would have masked the problem.

Does anyone have any suggestion please on what might be going on?

Kind regards
Richard

Full code

RESET
NVM
\ Variables
VARIABLE TEST 
 
: NOP ( -- ) $9D C, ; IMMEDIATE \ compile in-line a NOP
: ]B? ( c-addr bit -- f )
     $905F , 2* $7201 + , , $0290 , $5A5A , $5AFF , ]
; IMMEDIATE
: ]B! ( 1|0 addr bit -- )
  ROT 0= 1 AND SWAP 2* $10 + + $72 C, C, , ]
; IMMEDIATE 
: ]C! $35 C, SWAP C, , ] ; IMMEDIATE
: ]M! ( A -- ) \ copy bytes from A to this definition until $81 ( ret ) 
   DEPTH 0= ABORT" Empty Stack" 
   0 TEST !
   BEGIN
      DUP
      C@ DUP 
      $81 = NOT
      IF C, 1+ 0 \ fetch the char and save it to HERE
      ELSE -1
      THEN
      1 TEST +! TEST @ 32 = 
      DUP IF ." byte limit!!" THEN
      OR \ 4test - no runaways
   UNTIL
   2DROP  \ discard A and C
   ]
   ; 
RAM

NVM
\ Port C Pins
7 constant _TxEn

$50C6 CONSTANT CLK_DIVR

NVM
: :NVM ( -- xt ) NVM HERE ] ;
: ;NVM ( xt -- xt ) POSTPONE [ $81 C, ;  IMMEDIATE

RAM
$50F0 CONSTANT AWU_CSR1  
$50F1 CONSTANT AWU_APR
$50F2 CONSTANT AWU_TBR
$800E CONSTANT INT_AWU
NVM

: P500 ( -- )    \ AWU period about 500ms
  [ 62 AWU_APR ]C! [ $B AWU_TBR ]C!
  [ 16 AWU_CSR1 ]C! \ AWU enabled
  [ $8E C, ] \ HALT for AWU period
  [ 0 AWU_TBR ]C! \ reduce power consumption
;
:NVM              \ interrupt handler, "headerless" code
   SAVEC
   AWU_CSR1 C@ \ reading clears the interrupt
   IRET
;NVM ( xt ) INT_AWU !

RAM 
\ **** registers for low power consumption
$50C0 CONSTANT CLK_ICKR
$505A CONSTANT FLASH_CR1
$50C7 CONSTANT CLK_PCKENR1
$50CA CONSTANT CLK_PCKENR2
$5200 CONSTANT SPI_CR1
$5210 CONSTANT I2C_CR1
$5235 CONSTANT UART1_CR2
$5401 CONSTANT ADC_CR1
$5250 CONSTANT TIM1_CR1
$5300 CONSTANT TIM2_CR1
$5340 CONSTANT TIM4_CR1

$5000 constant PA_ODR
$5002 constant PA_DDR
$5005 constant PB_ODR
$5007 constant PB_DDR
$500A constant PC_ODR
$500C constant PC_DDR
$500D constant PC_CR1
$500F constant PD_ODR
$500F constant PD_IDR
$5011 constant PD_DDR
$5012 constant PD_CR1
$5014 constant PE_IDR
$5016 constant PE_DDR
$5019 constant PF_IDR
$501B constant PF_DDR

NVM
: _TXon  [ 1 PC_ODR _TxEn ]B! NOP ;
: _TXoff [ 0 PC_ODR _TxEn ]B! NOP ;
: Start ( -- ) \ transmit start pulse 
   [ ' _TXoff ]M! 
   [ ' _TXoff ]M!
   [ ' _TXon ]M!
   [ ' _TXon ]M!
   [ ' _TXon ]M!
   \ tx turned off by next bit in packet
;
\ Unit time is 128uS dt clock speed and NOP slowing down _TXon and _TXoff
: .0  [ ' _TXoff ]M! [ ' _TXon ]M! ;
: .1  [ ' _TXoff ]M! [ ' _TXoff ]M! [ ' _TXon ]M! ;
      
: Tx.On ( -- )
   [ ' START ]M! [ ' START ]M! 
   [ ' .0 ]M! [ ' .1 ]M! [ ' .0 ]M! [ ' .1 ]M!
   [ ' .0 ]M! [ ' .1 ]M! [ ' .0 ]M! [ ' .1 ]M!
   [ ' _TxOff ]M! \  turn tx off
;
: Tx.Off ( -- )
   [ ' START ]M! [ ' START ]M! 
   [ ' .1 ]M! [ ' .0 ]M! [ ' .1 ]M! [ ' .0 ]M! 
   [ ' .1 ]M! [ ' .0 ]M! [ ' .1 ]M! [ ' .0 ]M! 
   [ ' _TxOff ]M! \  turn tx off
;

: setup_pins  ( -- )
   [ 1 PC_DDR _TxEn ]B!  \ Port c outputs
   [ 1 PC_CR1 _TxEn ]B!  \ _TxEn is push pull
;
: MIN_I \ prepare for mimimum current draw
\ on reset these are off but included for reference if needed   
   [ 0 SPI_CR1 6 ]B! \ SPI off
   [ 0 I2C_CR1 0 ]B! \ I2C off
   [ 0 UART1_CR2 3 ]B! \ UART tX off
   [ 0 UART1_CR2 2 ]B! \ UART RX off
   [ 0 ADC_CR1 0 ]B! \ ADC off
   [ 0 TIM1_CR1 0 ]B! \ Tim1 off
   [ 0 TIM2_CR1 0 ]B! \ Tim2 off
   [ 0 TIM4_CR1 0 ]B! \ Tim4 off
   
   [ 1 CLK_ICKR 5 ]B! \ MVR regulator OFF in Active-halt mode
   [ 1 FLASH_CR1 2 ]B! \ Flash powered down in aCtive Halt mode
   [ 0 CLK_PCKENR1 ]C! \ No clock to peripherals
\   [ 0 CLK_PCKENR2 7 ]B! \ No clock to CAN - unused on STM8003
   [ 0 CLK_PCKENR2 3 ]B! \ No clock to ADC
\   [ 0 CLK_PCKENR2 2 ]B! \ No clock to AWU

\ following made need adjustment for connected hardware !!!   
   
   [ $FF PA_DDR ]C! \ all ports made outputs 
   [ $FF PB_DDR ]C! 
   [ $FF PC_DDR ]C!
   [ $FF PD_DDR ]C!
   [ $0 PA_ODR ]C! \ all ports made outputs low level
   [ $0 PB_ODR ]C! 
   [ $0 PC_ODR ]C!
   [ $0 PD_ODR ]C!
; 
: slowclk   \ slow CPU clock to 15625Hz
   $1F CLK_DIVR C! \  /8 /128
   ;
  
RAM
NVM

: MAIN  
   MIN_I \ set everything to MIN_I
   setup_pins
   SlowClk
	BEGIN
      Tx.On P500 
      Tx.On P500
      Tx.On P500
      Tx.Off P500
      Tx.Off P500
      Tx.Off P500
	AGAIN
;
RAM	
NVM
' MAIN 'Boot ! \ ignored if next few lines interpreted
RAM

\ On reset Run Main, not STM8 eForth
NVM
' MAIN $8002 ! \ STM8eForth now inaccessible
RAM

If just gets wierder. I stripped out everything except the words to toggle the pin off and on. My test code was:

: b0 
set_pins 
slowclk
[ ' .0 ]M! [ ' .0 ]M! [ ' .0 ]M!
[ ' .0 ]M! [ ' .0 ]M! [ ' .0 ]M!
[ ' .0 ]M! [ ' .0 ]M! [ ' .0 ]M!
[ ' .0 ]M! [ ' .0 ]M! [ ' .0 ]M!
[ ' .0 ]M! [ ' .0 ]M! [ ' .0 ]M!
[ ' .0 ]M! [ ' .0 ]M! [ ' .0 ]M!
[ ' .0 ]M! [ ' .0 ]M! [ ' .0 ]M!
[ ' .0 ]M! [ ' .0 ]M! [ ' .0 ]M!
[ ' _TXoff ]M!
0 CLK_DIVR C! _TXoff  
;

If I define this in RAM I get the following waveform:
SDB0RAM

It appears as if after every second pulse there is an extra NOP instruction. But the assembled opcodes show no such thing. I have a nice repating and consistent pattern of 1F ( off) and 1E (on) bytes. No extra NOP's ($9D) either:
AA CD 95 A7 CD 96 17 72 1F 50 A 9D 72 1E 50 A 9D ____r_P__r_P
BA 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 r_P__r_P__r_P__r
CA 1E 50 A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F P__r_P__r_P__r
DA 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 1E 50 P__r_P__r_P__r_P
EA A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A r_P__r_P__r_P
FA 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 1E 50 A 9D r_P__r_P__r_P

10A 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 r_P__r_P__r_P__r
11A 1E 50 A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F P__r_P__r_P__r
12A 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 1E 50 P__r_P__r_P__r_P
13A A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A r_P__r_P__r_P
14A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 1E 50 A 9D r_P__r_P__r_P

15A 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 r_P__r_P__r_P__r
16A 1E 50 A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F P__r_P__r_P__r
17A 50 A 9D 72 1E 50 A 9D 72 1F 50 A 9D 72 1E 50 P__r_P__r_P__r_P
18A A 9D 72 1F 50 A 9D 72 1E 50 A 9D 72 1F 50 A _r_P__r_P__r_P
19A 9D 72 1E 50 A 9D 72 1F 50 A 9D CD 84 50 83 50 _r_P__r_P____P_P
1AA C6 CD 82 FC CD 94 6 81 54 1 4 64 75 6D 70 0 _______T__dump

Now the datasheet clearly says that BSET and BRES (the opcodes corresponding to 1E and 1F ) both take 1 cycle.

Maybe I'm just dreaming. Let's try compiling B0 to NVM and try again:
SDB0NVM
Now I've got the leading pulse in each pair staying high for an extra clock cycle.

To recap, executing from RAM results in something using a clock cycle every second low period. But when executed from NVM the same word now uses an extra clock cycle on alternating high periods.

The only thing that was consistent was when I patched the address of B0 into the boot address and bypassed Forth altogether. That gave me the same waveform as when I executed the NVM version of B0. So I'm ruling out any strange overhead issue.

Maybe I should ask ST?

Two things come to my mind:

  • when running code from Flash ROM, 4 bytes of assembler code will be pre-fetched and fed into the decoder pipeline. Since instructions can be 1 to 5(?) bytes long the 32bit ROM fetch produces delays depending on the location of the code relative to 4-byte boundaries. Writing cycle-accurate code that runs from ROM isn't easy.
  • when executing code from RAM execution is often faster than the instruction data stream from the memory interface unit
  • The STM8S core isn't specified for really low clock rates (but I assume that's not the problem)

I agree Thomas that the way the pipeline is filled could be the cause of this. The programming manuals makes an off hand reference to actual clock cycles needed for an opcode being longer than stated by pointing out "In some cases, depending on the instruction sequence, the cycle taken could be more than that number." Add to that the fact that the pipeline fills differently for RAM compared to Flash. I also not that "The instruction access from Flash Program memory is 32-bit wide and it is performed from an aligned address i.e. 0xXXX0, 0xXXX4, 0xXXX8, or 0xXXXC."

So I tried three things today:

  1. write a byte directly to the PC_ODR register which intuitively sounds easier for decoding than the BSET BRES instructions
  2. remove the NOP I was using to slow down the toggling thereby giving me 4 bytes which fits with the 32 bit fetching, and
  3. align the 4 bytes with XX0 etc as per the above comment.

None of these fixed the situation. Then I thought, instead of using a NOP, use a Jump instruction to force the pipeline to reload.

: _TXoff  [ 0 PC_ODR  ]C!  [ $20 C, 0 C, ] ; \ Jump relative 0 bytes forward since program counter has already been incremented by 2 bytes

When I built up the long string of assembly op-codes using ]M! this worked a treat. I ended up reverting to the BSET BRES instructions since there was no advantage to writing a byte directly as shown above.

But then it was obvious. Dispense with the ]M! command in the transmitter code and use the inherent JP RET instructions in Forth to force the pipeline to keep refreshing.

This resulted in a consistent pattern being sent though the high time was still greater than the low time. I couldn't quite get to the bottom of this but I brute forced it with a few NOP's and bumped up the CPU speed slightly so my symbols were all in the desired speed range.

The inherent clock rate in my bit banging is around 16% of the CPU clock speed. Most of the time no-one clocks a STM8 so slowly but I was on a quest for minimising current with the device I had to hand.

One of my rule of thumb's now is casual bit banging only works up to a clock rate approaching 5% of the CPU clock . Any faster and you need to check what is happening since it can be quite different than you expect.

A big learning curve this week for me! Thanks for your help.