spth/OpenRabbit

program start by RFU?

Closed this issue · 17 comments

spth commented

Currently, for a program using serial port A, I have to:

  • Unplug diagnostics port of cable, plug programming port
  • Load program via RFU
  • Switch off power
  • Unplug programming port of cable, plug diagnostics port
  • Start a terminal program
  • switch on power

Could this be simplified? Could the RFU start the program at the end (maybe depending on commandline option), soI don't need to power-cycle? Could the RFU leave the cable in a state that allows serial I/O by a terminal program on the programming port, without requiring the use of the diagnostics port?

Hmmm. I think an easy solution would be to reset the processor via the control lines and send the single triplet needed to start the program (0x80, 0x24, 0x80). We could write a simple program to do that, and then fire up minicom at the desired baud rate.

But note that the minicom configuration will need to keep its hands off of DTR (or start up holding it in the correct state), otherwise you'll reset the processor again.

spth commented

First attempt in 66bf9c2, doesn't seem to be working yet, though.

I wonder if leaving the serial port open in openrabbit will allow the code to run? Closing the port might change DTR and pull it back into reset. I might try that out, and see if the LED lights up, indicating that the code is running.

Got it working with 5b50f67. Needed to drop back down to 2400 baud before sending the triplet.

The program continues to run after openrabbitfu exits, but I haven't been able to re-open the serial port in another application to monitor stdout. Both minicom and screen end up resetting the processor via the DTR line. I wasn't able to find configuration options for either to change the startup behavior.

In Google searching, I found a reference to a miniterm.py script with support for specifying the initial DTR signal level on startup.

We might want a little standalone program to just do the "reset and run" step. I'll reference my request #18 to perhaps have multiple programs instead of a single program with two modes (rfu/debug).

We could switch the serial port up to 38400 after sending the triplet, and then just dump any received data to stdout. Maybe that could be a feature of the "reset and run" program, with command-line options for the serial port and baud rate. And send any entered characters.

spth commented

I'm wondering if something rather basic (i.e. just invoke stty manually for settings, then just display the output via cat) would work, but so far I haven't had success.

Worst case, we could still implement a --display that just keeps the port open in the RFU and outputs whatever is received.

spth commented

I have now implemented a different approach instead: A --serialout option that keeps the serial connection and just outputs whatever it receives from the Rabbit.
Unfortunately, it currently is not reliable yet: Once in a while, I simply see no output from the running program. Maybe there is a race condition or other timing issue?

IIRC, you're using hard-coded baud rate dividers instead of the run-time calculation in Dynamic C's Standard BIOS. I recently helped a customer who was seeing their hardware start up with an incorrect divider for some reason, so if you are using run-time calculations, you might want to review that code (or switch to hard-coded values).

spth commented

One, but not the only, problem was that reset was unreliable. It used to pull the reset line low for 250 ms.Increasing that time to 400 ms helped.

spth commented

I looks to me like the remaining issue is indeed a serial line speed issue: sometimes instead of seeing nothing on the host, I see some garbage, as one would expect when serial speed between host and Rabbit don't match.

When I then retry (i.e. write another image to the RCM via OpenRabbit) it tends to work again. But over time the error gets more frequent.

So I wonder on which side the choice of the serial line speed could go wrong: On the host, OpenRabbit just uses tcsetattr. On the Rabbit, the example programs (such as the "Hello, world!") use the asynchronous mode on port A, configured for the respective baud rate.
I mostly test using an RCM3319 (which has a Rabbit 3000A). I have a script that alternatingly writes a "Hello, world" and a Dhrystone to the RCM. For each, it parses the output, before it continues. This typically hangs after a dozen or so iterations of the loop.

Which dividers could be wrong on start? The "Hello, world!" explicitly writes WDTTR and GCSR immediately after startup, later it writes PCFR, TAT4R, TACSR, SACR. All other I/O registers are left at startup values. Then the serial output is done via SASR and SADR. No other I/O registers are written. This should (and usually does) give us "Hello, world!" at 38400 baud. The Dhrystone is similar, but also writes GCDR, MB0CR and MB2CR - it runs at twice the speed, so it needs to configure wait states. Later it also uses RTC?C.
The code is at https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/hello-r3ka/, https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/dhrystone-r3ka/ and https://sourceforge.net/p/sdcc/code/HEAD/tree/trunk/sdcc-extra/historygraphs/execute_benchmark-r3ka.

I wonder if maybe the osciallator gets unstable?

There could be instability in the oscillator at startup. I know that the BIOS has code to calculate a baud rate divider based on the clock, and recent/later BIOSes would re-run the calculation until it got the same result twice in a row. I think the startup instability may have been related to the 32kHz RTC clock and not the CPU clock.

You might be able to hard-code the divisor setting, since it should be a fixed value based on the CPU clock. You could use the hard-coded value to set the timer registers, and have it output the calculated value over the serial port to see what it's calculating.

If you're changing the doubler setting, I think you need to update the timer registers as well.

You could hook a logic probe up to the serial lines to see what they're doing when the module hangs. There might be some other bug in your code that causes it to go into an infinite loop. Unless you're also toggling a "heartbeat" LED in your code loop so you know it's still running...

spth commented

I finally got around to check with a logic analyzer. Looks like there are two problems:

  1. Failure to start the program: After the program is loaded into the flash of the Rabbit, we reset it, then start the program via 0x80 0x24 0x80. It looks like sometimes, the very last line transition (i.e. low to high at the beginning of the last bit) is a bit too late. With the logic analyzer, I see that as if OpenRabbit had sent 0x80 0x24 0x00, and the Rabbit seeing those values is consistent with it not starting. Just sending the 0x80 0x24 0x80 twice is apparently a workaround (but causes other issues, see below).
    However, that still leaves the question on why the line is going high too late. After all this is at 2400 baud, not some fancy high speed. And just before the reset, we successfully transmitted the cold loader at 2400 baud, the pilot at 57600 baud, the user program at 460800 baud.

Image
(from top to bottom: /RES, STATUS, PC6_TXA, PC7_RXA, measured at the J61/J62 of the RCM3319)

Looking at the schematics of the (USB) programming cable and the RCM3319, I don't see anything that looks like it could affect the PC7_RXA line except for the FT232R and the Rabbit 3000A.

  1. Garbage being received while the user program is running on the Rabbit. E.g. when running a "Hello, world!"-program on the Rabbit, OpenRabbit might output "��� ��K�V4!A�Y�lo, World!" to the console. I haven't looked into this closely. The workaround for 1) makes 2) happen far more often. I think that is due to OpenRabbit always using the same speed for sending as for receiving, so we miss a part of what we already receive at 38400 baud from the Rabbit, while still sending the second 0x80 0x24 0x80 at 2400 baud. So once we switch to 38400 baud, we're in the middle of receiving stuff already.
spth commented

Having had another look, I think this is another tcdrain() / tcsetattr(., TCSAFLUSH, .) issue, affecting even USB-serial-converters, where tcdrain() otherwise works well enough for OpenRabbit. Apparently tcdrain() returns before all data in the USB-serial-converter's buffer has actuallybeen sent, and tcsetattr(., TCSAFLUSH, .) does not wait for all data to be sent before the new baudrate setting goes into effect. Thus we change the baudrate before having fully transmitted everything, and the baudrate setting change messes up the transmitted data. If we only transmit a single triplet, this apparently can affect the last bit. If we sent more triplets, if can even affect full bytes (that are then sent at the new baudrate instead of the one that was in effect when the host transmitted them to the USB-serial-converter).
The obvious workaround is thus to just wait a few a few ms before changing the baudrate (and the previous workaround of sending the triplet twice actually worked by the delay it caused before the baudrate change). Which results in 2), again (one could fine-tune the delay, so it just works, but then that "just works" will probably only work for a very specific combination of USB-serial-converter and driver). I'll try to come up with a workaround using asymmetric baud rates.

I agree with your conclusion that making serial port changes before the data is completely sent is the likely cause.

Maybe try using tcdrain() to ensure that the data has left the UART before changing the baud rate? I don't know how well the USB serial adapter will handle that call.

Do any of the status pins change after sending the last triplet, so you could identify the reset event and then switch baud rate?

spth commented

In my experience tcdrain() is not reliable enough. Some USB-serial-converter drivers don't support it at all, some (like the FT232R in the Rabbit USB programming cable) support it somewhat, but not well enough (OpenRabbit already does a tcdrain() after sending triplets, but we still ran into this problem). I'm currently with an approach based on the STATUS/DTR line. Basically, OpenRabbit sets it low via GOCR, then the user program sets it high via GOCR to signal to OpenRabbit that it is running (not very elegant to put requirements on the user program,but I don't see a better alternative). Then OpenRabbit switches the baud rate. The user program still needs to delay a bit before starting to send data, otherwise some data might arrive at the host before the switch happens, and we'd be back to 2).

P.S.: While previously, --run --serialout were quite unreliable for me (rarely worked ten times in a row), I now managed reliable writing, starting, serial output 510 times in a row using a USB Rabbit programming cable and an RCM3319. Next, I want to do some testing with other hardware.
P.P.S.: Testing using two Rabbit USB programming cables (i.e. FT232R USB-serial-converter) on RCM3110, RCM3319 and RCM4110, the results look good. But I get a regression on a RCM2200, where I no longer see serial output (it works with OpenRabbit 0.2.3).
P.P.P.S.: Solved the RCM2200 issue. Looks like the Rabbit 2000C on that board can accept triplets 258 ms after the rising edge of /RESET, but I started sending after 250 ms already. Increasing the post-reset delay to 300 ms made it work.

spth commented

Trying USB-RSR232 converter with a serial Rabbit programming cable.

  • Using a black/grey LogiLink converter (recognized by Linux as FT232R), I see:
  1. Both with my current variant and with 0.2.3 it doesn't work without --slow ("Error: Status line should be high after sending initial loader.").
  2. It then works with 0.2.3, but not my new variant.
  • Using a blue CH340-based converter doesn't work (no surprise, #26 still applies, both for 0.2.3 and my new variant.

  • Using a blue PL2302-based converter cable doesn't work either, now even with --slow. Artificially restricting the baudrate for communicating with the pilot loader to 57600 didn't help either.

  • Using a black LogiLink converter (recognized by Linux as PL2302), doesn't work with my new variant. Using 0.2.3, it works with --slow only.

  • Using a black V.Top PL2302-based converter cable that came in a cardboard box doesn't work either (different error, though), not even with --slow.

  • Using a black Digitus USB-RS232 converter, that came in a cardboard box (marked RS232RL, recognized by Linux as FT232R) works with my new version without --slow (but not with it). In 0.2.3 it is the opposite: it works with --slow, but not without it.

  • Using a black Digitus USB-RS232 converter, that came in plastic (marked PL2302G, recognized by Linux as PL2302) doesn't work with my new variant. Using 0.2.3, it works with --slow only.

So 10 combinations work with neither 0.2.3, nor the new variant, 1 works with the new variant, but not 0.2.3, 4 work with 0.2.3, but not the new variant.

I most likely won't be able to look into this further for two weeks from now, but before putting anything into the repo, I should look into the four combinations that work with 0.2.3, and see if I can get them to work.

spth commented

While the earlier experiment of writing, starting, serial output 510 times in a row using a USB Rabbit programming cable and an RCM3319 had gone fine (I had alternatingly used a "Hello, world" and Dhrystone as payload), and I was able to repeat it (this time with alternating "Hello, world" and Whetstone as payload), the last time didn't go so well: I now alternatingly used a "Hello, world" and Coremark as payload, and among the about 400 total writes, I did not see serial output twice (both times Coremark). Still, this is far better than before. But it also looks like it will be hard to debug further, since it happens so rarely.

spth commented

As of 3d55f38, I think this works reasonably well.