xmos/lib_i2s

frame based I2S master bug

Closed this issue · 17 comments

Problem: sometimes noise, sometimes volume up, sometimes volume down. Just like some bits are shifted.

How to reproduce:

  • run I2S frame master loopback on MC
  • then using xrun --dump-state your.xe (for reference: normally, just a short break, then continue playing)

The issue was originally found by a customer:

  • power off for over 10min, then power on, there is about 90% probability to reproduce.
  • because 10min is too long to know the issue, then found the xrun --dump-state way to reproduce, but didn't find a solution yet, then using the normal I2S master as a workaround, since the requirement is only 48k sample rate, however, in this project, there are 3 kind I2S drivers:
  • normal I2S master in -> normal I2S master out, for 48kHz only SR;
  • I2S slave in -> I2S frame master out, for supporting up to 192kHz SR;
  • UAC2 in -> UAC2 FW native I2S core out

Hi Quinn,
I helped develop I2S frame based. It's very low overhead and so can deal with quite long back-pressure on the callbacks (total time of all callbacks must be less than the frame period) and has been working at 768KHz, so 48KHz should be trivial. However, like all of the I2S implementations, it will only hold synch if the callbacks do not delay the I2S loop beyond the limit. If this happens then LR clock and/or data will be shifted (BCLK is free running). So to progress this bug, we need to be sure that the application does not assert a significant delay causing I2S to break. Can you insert a timing assertion to test this?

Something like this in the restart case:

time_old = time_now;
t :> time_now;
if (time_now - time_old > 2084) {
debug_printf("timing assertion fail in i2s handler%d\n", time_now - time_old );
__builtin_trap();
}

Hi Ed

It is not the callback delay issue.
I can understand your concern, because when I met the problem originally, I doubted the delay (or the efficiency of the interface call) as well, so at that time, I tried to:

  • change the interface call to the direct channel transfer, no improvement;
  • use the hard-coded audio samples to replace the real in samples, found, especially, the I2S frame master in has problem, but can't find out the root cause.
    finally, using the simplest loopback testing, found it is the problem of the lib_i2s:
  • if using the simplest loopback, means I2S frame master in, then store in a local sample buffer, then using I2S frame master out from the buffer, the issue is easy to reproduce with the xrun --dump-state.

It is worth noting:

  • in the loopback testing, the application callback can't be a problem anymore. And in fact, the issue can be reproduced at low sample rate: 48kHz when found the issue.
  • using "xrun --dump-state" can speed up to reproduce the issue.
    You may not be able to see this issue normally, means most of time, the I2S frame master works well. In the customer project, when power-off for 10min, it just has 90% probability to reproduce.
  • And another thing can prove it is not because of the callbacks delay, since if using the normal I2S master (not the frame one. Can't support 192kHz), the loopback testing and customer project (48kHz required only) won't have the issue.

Hi Ed

  • Did you reproduce on xmos hardware or is this just seen on the customer hardware?
    [Quinn] Both.
    For the testing on MC, seems github can't support to upload the zip and xc file(?), I will send you the testing software on the MC:
    it based on the AN00162_i2s_loopback_demo, and supports both normal i2s master and frame i2s master via the build option, so you can test on MC easily:
  1. xrun xxx.xe, you can hear the normal loopback

  2. then xrun --dump-state xxx.xe, you can hear the noise with AN00162_i2s_loopback_demo_frame_i2s_master.xe; but it is normal with AN00162_i2s_loopback_demo_normal_i2s_master.xe

  • Do you have the simple test case program to share?
    [Quinn] Yes, please find the separate mail, thanks

  • Regarding the "xrun --dump-state” case. Do you mean you can cause a running system to fail by typing that command from the host?
    [Quinn] Yes, since it is the easiest way to reproduce the customer project issue.
    Though I am not sure if the dump-state is related to the real issue, but if the dump-state can cause the issue, then can see the issue on the real project; and if using the normal i2s, the dump-state can't cause the issue, then on the real project, can't see the issue anymore. So from the simple logic, I suppose they are related.

  • If so, this is exactly what we would expect. Just like print over JTAG, the JTAG operation interrupts the entire tile. It will do this to interrogate all of the state and report back to the host. So I would absolutely expect real-time to be broken if you try to dump the state of a running device. This would almost certainly cause alignment issues on I2S.
    [Quinn] So this is my concern, firstly because the normal I2S can't reproduce that issue. When running the normal i2s with dump-state, it will recover the normal playback.
    IMO, since xCORE is the i2s master, means the I2S timing is controlled by xCORE, then there should be no alignment issue rather than I2S slave, if there is alignment issue with I2S master mode, then we should be able to re-sync anyway. So this refreshed my world view :), just like another ESD issue, of course, this is another story.
    BTW, seems it is the issue of the master in, master out seems work normally.
    And in fact, in the real project, the issue will happen with I2S frame master anyway at a low freq without dump-state and JTAG.

BTW, now this is not a block issue on customer side, but it is worth finding out the root cause when you have time, since it is indeed an issue for who is using this lib.
Thanks for your attention.

Cheers,
Quinn

Hi Ed

Disregarding the dumpstate, the fact on the real customer project, if using the normal I2S, then customer didn't find the random startup noise anymore.
That shows:

  • the hardware is the same.
  • only the software "frame I2S -> normal I2S" can solve the problem.

So the simple logic above should be able to show:

  • it is the software issue?
  • the hardware has no problem?

Best Regards
Quinn

Hi Quinn, is this still an issue (if so we need a repeatable failure case) or can we close it?

@ed-xmos, can this be closed now?

@QuinnWang @ed-xmos looking through all of the I2S issues currently - going to close this issue as it appears to have been resolved; if it does need reopening due to further as-yet-unrecorded-here observations let me know.