SlashDevin/CosaGPS

volatile variables not atomically updated

SlashDevin opened this issue · 4 comments

From jediunix:

I'm seeing issues with my current code (not on GitHub) that is interrupt driven that I suspect your implementation has as well. In your post you said

     // Do something with the fix() data member, quickly, or the interrupt routine
     //    will start replacing data as more characters are received...
    // Copy it to a local_fix if you need more time.

The problem is the lack of atomic updates. Not only between multiple variables but I suspect within a single variable as well. This is the kind of insidious problem that is rarely seen but...

The solution is likely the use of synchronized but I haven't used it so would need to study the appropriate usage.

Hehe, that is the warning of the volatile keyword, after all. :-)

This is one advantage of using events: the interrupt queues up an event when fix has been updated by received data, and the main loop waits for that to appear. That's when it's ok to look at fix. Any other time, and fix may be in flux.

After the event is pushed, the fix will not be modified until the trailing CR, LF, the '$' start of the next sentence, 5 more characters for the next command, and a comma are received. That's 9 character times. At 9600 baud, that's um... 9*11 bits/(9600 bps) = 10mS.

I think I've structured my app so that I do a few quick things in the background, then check for events. 10mS has been plenty of time to either (1) use the data quickly, or (2) copy it. Did you notice there's a new app in the NeoGPS repo?

If you find that you can't get the fix copied in time, you could wrap the copy statement with synchronized:

synchronized {
  gps_fix_t snap_shot = *(const_cast<const gps_fix_t *>(&fix()));
}

That would disable interrupts during the copy, but it really doesn't take that long. I suggest restructuring so that the event is noticed (i.e., dispatched) more quickly.

Remember that a fix comes about once a second. Even though multiple events are received, they should just be accumulated until the seconds change. Trying doing your work right after the fix is ready, then resume waiting for an event. Hopefully, you can finish the extra work before the next sentence comes in, probably in about a second.

If there are too many things to do, you will have to use the polling technique: get one char, give it to gps, then check for events. While you're processing the event, characters will get held by the output buffer you gave to the UART (i.e., make it big enough for your timing requirements). Then the fix will not be modified until you start giving chars to the gps instance again.

Does that help?

Cheers,
Devin

volatile is only part of the solution. On an 8bit processor any time you update a variable with more than 1 byte there is the potential for an interrupt between bytes leaving the variable invalid, volatile or not. That is why synchronized is used all over in Cosa (example: https://github.com/mikaelpatel/Cosa/blob/master/cores/cosa/Cosa/RTC.hh#L116).

In my opinion, when code is intended to be used by others, the user should have minimal awareness of the details of the implementation - like the use of interrupt mechanism. In both of our implementations, all the accessor methods need to use synchronized. In your implementation you are modifying data incrementally as each character is received. I understand you have heavily optimized for space but your data is invalid as the characters are arriving - slowly. synchronized won't fix that.

At a higher level your code updates the single instance of variables incrementally as data is received. Consider after a sentence is received making the data valid, then as the next sentence comes in the data is slowly updated. More than one sentence has similar data and arrive in a burst every second. So your data may be longitude from one sentence and latitude from the next (and potentially garbage latitude until it is completely parsed). Granted they should be the same. I tried to handle that by maintaining an intermediate set of temporary data and only updating the real data after a full sentence. This has a much smaller window but it would not be eliminated without synchronized copy of all the data.

Yes, a fix a second - with some fix data duplicated over multiple sentences in that second.

In short these issues should be addressed before something should be contributed to Cosa IMHO. I think it is a question of space vs correctness. All these issues with correctness would drive me towards not using this code.

In my case I'm leaning towards not using the interrupt method in my code exactly because of all of these issues.

I sense your frustration with trying to use the package! I know it's hard to write software in a reusable way. Similarly, it is hard to reuse software without good documentation. My apologies for the poor documentation. I'm afraid I'm guilty of something like, "It should be obvious to the most casual observer...". So, sorry for that. I appreciate your comments describing your experience.

Mikael has done an outstanding job in writing easy-to-understand software in the simplest way possible. It also turns out to be very efficient and maintainable. Rarely do you find all these attributes in one package.

He is also good at designing software that is reusable in many contexts, although the object orientation can be intimidating or overwhelming for many people that have only experienced the Arduino environment.

Cosa has been through several iterations, getting distilled into a very tidy, concise work of art. Unfortunately for you, NeoGPS is in its first iteration, and it suffers from a lack of use cases. Again, thanks for taking time to offer feedback!

Some of the use cases I have considered include:

  • sync or async operation (putchar called from loop() vs interrupt)
  • event or polling (Event::Handler vs. fix() call)
  • fused or not fused (multiple reports into one fix)
  • basic NMEA, proprietary NMEA, or vendor-specific protocol
  • buffering of fixes... how many? merging policy? event policy?
  • ATTINY environments with very limited RAM and program space

I think I've laid the foundation for all of those cases, but I have not implemented all of them. Indeed, CosaNMEAGPS only tackles the use case of [async, event-driven, fused, basic NMEA, non-buffered]. You suffered through trying use it in a polling context. Congrats on being able to do it with just one Arduino forum message for guidance.

Your comments did lead to a new member called is_coherent. It can be used to determine if the entire fix() structure is in a valid state. When used in an async context (polling or event driven) , it must be wrapped with a synchronized block.

There is also a new comment block describing recommended methods for (and pitfalls in) accessing the volatile fix(). You can see how the synchronized blocks are used with is_coherent to safely access fix(). Because coherency is defined for the entire fix() structure, I currently don't see the need for synchronized blocks around the of fix() member accessors, like latitude. That wouldn't address the not-quite-parsed state of a field, either.

Well, as I work through the use cases, I hope the software begins to look more like Cosa. Something to which all software folks should aspire.

Cheers,
Devin

My frustration wasn't with your package as I have not tried to use it. Only looked at it for comparison to what I have implemented. I'm a "big iron" software engineer and perhaps have different expectations beyond what is possible in a highly constrained environment like AVR. I have been considering which way to use my own implementation - for now I'm going to avoid interrupt method in my own code. My reason for commenting on your code was that these kinds of subtle issues should either be resolved or at least clearly highlighted if this code were to make it into Cosa proper as many people, myself included, would just reuse without fully understanding the risks.

I will look at your latest updates over the next few days. Thanks.