kachurovskiy/nanoels

Encoder atomic operation improvement, Request for comments

Closed this issue · 5 comments

Hi Maxim, any interest in me making a PR for this? I did a small scale test and it seems to be an improvement, but I'm planning on doing a full multi-hour smoke test before submitting a PR, but wanted to ask if you would like me to do this, or do it yourself, or not bother with either?

2d94347

Turning off interrupts to process critical flows without memory corruption from an interrupt is common practice in the arduino space, particularly where interrupts are used in low speed (human scale speed) and non-critical operations like UI encoders. It's a common way to do it in the arduino space particularly since AVR doesn't support atomic operations larger than 1 byte. The issue I'm solving is there is a non-zero chance that a pulse interrupt is missed in that time. It's actually quite measurable on my machine with a much coarser leadscrew than the reference lathe, and gets worse with higher RPMs.

Considering the esp32/xtensia architecture does support C++11 atomic operations properly, I made a small quick change to use them for the critical variables, and removed the sections turning off and then on again the interrupts.

I've done some light testing on my bench rig but I plan on doing a complete test plan showing the before and after results, using an additional esp32 to emulate an encoder. At the end of a several day long run, the nanoels pulse count and the simulator pulse count should match within 1 interrupt edge.

Hey Chris, huge thanks for sharing this.

  1. According to my tests long reads/writes are atomic in ESP32 - https://t.me/MaximKachurovskiy/119 - if you can prove otherwise it would be a big contribution to the world. See also https://esp32.com/viewtopic.php?t=4686 - longs are 32 bit.
  2. Blocking interrupt doesn't cause them to be missed - merely delayed until the block is lifted. Only case when an interrupt is missed is if 2 or more interrupts arrived during the block.

In 2d94347 the code between noInterrupts() and interrupts() is just 2 lines with long arithmetic - I suspect they take 5ns or less. Let's say we run 10000rpm spindle and 3600 pulses encoder - that works out to 597600 pulses per second or 1673ns interval between pulses. Looks like there's a good safety margin?

At the same time, I do also have problems with encoder drifting away over time at high RPM. I believe the issue could be a hardware limitation of H4 at this point though - to be solved by the 3rd encoder signal line Z which would report a full turn.

There's also something I don't quite understand wrt. rotary encoders related to differential signal handling and using 2 signal lines for both A and B and maybe using optical isolation but I didn't investigate that.

Cheers!

The more I think about it, the more I think it's not actually a non-atomic operation problem, though I do feel that turning off interrupts is a bit of a sign of an architectural issue. I did attempt to go after the easy to implement solution first :D I suggest that the esp32 pulse count peripheral might be a better choice for counting, and reorganizing things to not require mutexes/semaphores as much as possible so that inter-core operations are not blocked on the same chunk of code. though that doesn't get around the below capacitance issue.

I've done some playing with optical sensors and interrupts in a different application in the mean time since posting that, and I have a feeling that we may be seeing more effects from a long rise time due to capacitance causing what looks like missed interrupts when it's actually multiple pulses getting combined by floating slightly above or below the signal HIGH threshold due to a slow rise time in the LC filter on the input to the GPIO.

Additionally, quick math suggests the LC filter is going to make a resonant frequency at around 10hz, and it will act to smooth out the pulses above and below that. The rise time is estimated at around 0.0105 seconds (depending on Q factor and a pile of things I didn't measure).

The threshold between a LOW and a HIGH is just 0.5v, low being 0.25, high being 0.75, and according to quick calculations, the resonant frequency of the filter is around 10hz so it'll progressively work to smooth out ripples faster and slower than that. The math could very easily be wrong, I'm going to investigate this with my 500mHz oscilloscope later. It will be easy to see on that, I'll stick one probe on the input to the filter, and the other on the output.

I can't find it now, but I came across a few people suggesting that rise time and slope are more critical when using interrupts than when using a polling system. It seems to align with my experience, being that more square edged pulses are detected more reliably by most microcontrollers.

Since my encoder supports 24v levels as well, I'm toying with the idea of using a simple level shifter, and replacing the inductor with a 0ohm resistor. Then, the signal to noise ratio would be very good (a few 10's of millivolts vs 24v), and the rise time becomes the simple DC time constant of the capacitors, rather than being a resonant circuit due to the inductor.

as per 2, since the esp32 is running FreeRTOS, turning interrupts off using the arduino-esp32 abstraction only turns off interrupts on the arduino core, and leaves them enabled on the other core, which is dangerous https://docs.espressif.com/projects/esp-idf/en/v4.2-beta1/esp32/api-guides/freertos-smp.html#critical-sections
espressif/esp-idf#7367
turning off interrupts isn't a good way to guarantee consistency between tasks. portENTER/EXIT_CRITICAL is the proper FreeRTOS way of handling shared data multi-core.

There is conflicting info if the esp32 queues pending interrupts with port_DISABLE_INTERRUPTS or stops them from being triggered at all, though the freertos documentation suggests that disabling the interrupt makes it ignore the state of the pin. The pending interrupt buffer isn't infinitely big either.

This brings up another point though, the arduino wrapper costs quite a bit of time. I have measured as high as 3 microseconds with the bare esp-idf interrupt api, and with the arduino wrapper on top, it goes up to 30 to 40 us. Which means at best, with no arduino wrapper, we can do 1 million interrupts per minute, and 10,000 rpm at 600 counts (1200 pulses, with the two interrupts) requires 1.2 million. That's also if we're doing nothing else with the system and there is never any contention. That goes down to 100,000 with the arduino wrapper, falling an order of magnitude short. So I think with the reference hardware, we have a theoretical maximum rpm of 1,666 rpm (which possibly coincidentally, is the speed that I start seeing the problems very frequently)

I'm also investigating dedicated SPI attached quadrature counter ICs, they're way faster for the specific task and a simple loop can just poll it for counts. I'm imagining an architecture where a single timer loop can poll for a count, compare to last count, and do the step planning, all in the same scope. It's going beyond the scope of this issue of course, but I'm thinking that since the esp32 is capable of assigning any pin as SPI pins, we could use some of the unused terminals to a simple single board decoder module.

Sorry for jumping topics so many times, just things I've been thinking about!

Chris,

  1. Is the problem you're seeing causing extra pulses or missed pulses? Please double check as this is critical for further reasoning.
  2. It's easy to test whether noInterrupts() is the root cause - just remove noInterrupts/interrupts calls from https://github.com/kachurovskiy/nanoels/blob/main/h4/h4.ino#L2988-L2991 and see if this changes anything
  3. To make the case for that 4-line code block taking more than a handful of nanoseconds it'd be nice to run a simple speed measurement on it - getting micros, running the code block 1000 times, getting micros again and calculating the run time
  4. "we can do 1 million interrupts per minute, and 10,000 rpm at 600 counts (1200 pulses, with the two interrupts) requires 1.2 million" - we only use 1 interrupt for the encoder - can you please also double-check that you're converting RPM to RPS when calculating?

Thanks!

it's always missed pulses. And quite a measurable amount. as much as 50 pulses per rev in some cases, judging by how off the next pass is, and because it's incremental it accumulates. I have checked with an RF (1ghz) pulse counter right at the encoder, and all pulses are accounted for, so it's not a mechanical issue.

40us per interrupt with the arduino abstraction. 1000000 microseconds in 1 second. 1000000/40 = 25,000 interrupts in 1 second maximum.
10000 rpm is 166.66 rps. with a 600CPR encoder, (166*600) that's going to need 100,000 interrupts per second. So I was wrong using 1200 instead of 600, but we're still over budget by a factor of 4 for 10k.

25,000 interrupts per second / 600 CPR = 41 RPS, * 60 = 2460 rpm maximum with an otherwise completely idle core and no delay for locks. And considering for reliable sampling the sample speed should be at least 4x the wave form being sampled (if we're talking oscilloscopes and logic probes, anyway), you can see what I'm getting at,

However, this doesn't explain how I lose a bunch of pulses even at 300 rpm. I haven't had a chance to crack out the good RF measuring gear yet and take a look at what the waveform at the MCU pin looks like yet, but I plan to do that next week. Cable lengths and things like that will contribute as well.

Differential signals would eliminate common mode noise, but wouldn't do anything about rise time or other causes of missed pulses. You'd need a third signal to determine which of the two is the correct one. Best we could do is reject common mode noise, and determine that a particular pulse is ambiguous. Differential signals can be very important for long distances in high noise environments, but I'm using proper full shield single side grounded cabling. I will do some common mode measurements anyway just in case.

Considering Mach3 does threading with a single index pulse and no other spindle information, I wonder if I could get away with a 36CPR made out of simple hall effect sensors with the assumption that I only do threading with the spindle running and not moving too slowly.

I'm closing this for now, I've decided to go with another approach, the pcnt periferal is designed to count pulses, and does it's own interrupt handling. I've also started playing with dedicated hardware quadrature encoder counters, I may make an SPI attachment using an stm32 so nanoels can query counts as needed without it needing to use interrupts at all. I'll put Z signal encoding and rollover management in the stm32 as well. The nice thing about the stm32f1's quadrature decoder is it has built in jitter management, and it can switch between X1 mode and X4 mode dependent on rpm.