RobTillaart/FRAM_I2C

memory addressing for MB85RC1MT

mbmorrissey opened this issue · 47 comments

It would be handy to be able to easily access the whole memory of the MB85RC1MT!

@mbmorrissey

The development branch to fix this issue has been created and first tests have started.

@mbmorrissey

  • All files compile
  • first test with hardware MB85RC256V 32 KB works
  • added your test sketch to the PR.

Q: can you verify the develop branch with your MB85RC1MT 128 KB

Won't this introduce unnecessary overhead for all <1Mbit devices?
Would it be possible to simply use an overloaded version?

Yes that would be possible too. (Derived class)
But the overhead is mainly parameters passing to functions so I expect the overhead is small.

It should be tested I agree.

Hi Rob - I sent this by email the other day but sorry i didn't put it in this thread - below is result of the test you requested. Thanks, Michael

Hi,

No luck. It seems memory address 1 still gets overwritten in my test script when writing to address 65537.

Not to presume to fully understand the problem, but based on page 8 here

https://www.fujitsu.com/jp/group/fsm/en/documents/products/fram/lineup/MB85RC1MT-DS501-00027-4v0-E.pdf

I would have expected that lines 300 and 314 here

https://github.com/RobTillaart/FRAM_I2C/blob/develop/FRAM.cpp

might have needed a modified MSB bit in _address when memaddr > 65536.

Michael

@GitMoDu

did some tests

Sketch: testFRAMPerformance.ino
Platform: UNO
IDE: 1.18.19

Version UNO size notes
0.3.4 n.a.
0.3.5 7540
0.3.6 7540
0.4.0 7636

I2C: 100.000

Version write 1200 bytes us / byte
0.3.4 136704 113.92
0.3.5 136704 113.92
0.3.6 136704 113.92
0.4.0 136716 113.93
Version read 1200 bytes us / byte
0.3.4 141708 118.09
0.3.5 141760 118.13
0.3.6 141760 118.13
0.4.0 142016 118.35
  • Size difference +96 bytes, for UNO this is substantial, so separate class makes sense
  • Performance difference ~0.2 % @ 100 KHz Faster bus speed would increase this
    will upload performance files asap.

performance files added to example sketch folder.

@mbmorrissey
Thanks for testing, apparently I need to dive into the datasheet again, will take some days.
Given the impact on the sketch size (see above) for the UNO and smaller I consider a derived class for the MB85RC1MT

but first we need to get a working version, then decide how to proceed.

@mbmorrissey
Did a quick read and it looks like internally there are just two 64 KB FRAM's used to create the MB85RC1MT

Seen a similar pattern with 128 KB EEPROM in the past.

  1. To solve this 100% the _writeBlock() commands need all to be checked if the block crosses the boundary.
  2. A simpler - not 100% - solution is to define that the block may never cross that boundary, => user is responsible.
    This latter would be a first step and it would make the MB85RC1MT usable.

As this extra administration will generate extra overhead for every _writeBlock() (and read) it will affect performance (expect not too much) and footprint of the library.

Will try to make solution (2) asap

  1. A workaround might be to define one MB85RC1MT as two FRAM objects with consecutive addresses e.g. 0x50 and 0x51.
    I expect that should work with the current library.

@mbmorrissey
pushed a fix for option (2) above the FRAM.cpp in the develop branch.
Please verify.

Size impact for performance sketch an additional +20 ==> 7656 bytes (+116 more than 0.3.6 release)
Performance impact - minimal diff with previous 0.4.0 test above

Hi Rob,

The current version no longer overwrites the first block, and writing to the second block seems to work, because clear() works correctly. However, reads from the second block actually read from the first. I think the issue is that this line

 _wire->requestFrom(_address, size);

in _readBlock() needs to be made sensitive to whether the second block is being accessed. When I replaced that line with

if (memaddr & 0x00010000)
{
_wire->requestFrom(_address + 0x01, size);
}
else
{
_wire->requestFrom(_address, size);
}

i.e., similar to what you have earlier, it seems to work great. Not sure if it would be better to check if (memaddr & 0x00010000) only once, and generate a temporary value for _address to use both times that it is needed, but I'm sure you'll have a better idea than I would.

Michael

Yes of course, stupid me,
Your proposal to check only once makes sense.
In terms of performance / size I expect no big gain as the compiler could have optimized it already.

I will update code later today,
Thanks for testing

Hi Rob,

Happy to test, and pretty free for the rest of the day, so just tell me when you're ready for another test.

For testing the chip with future updates (perhaps not necerssarily even updates like this one directly motivated by this chip's features), I'd be happy to continue to help. But maybe it would be easier for you if I posted you a MB85RC1MT IC? If you want to send me the address (direct by email if you want, rather than posting private details here), I'll drop one in the post.

Michael

@mbmorrissey

Pushed an update for the requestFrom(),

  • used an local address to do the address math (+1)
  • optimized the loop to write / read => reduced footprint a few bytes (-6) - no performance gain

If the MB85RC1MT is breadboard friendly (breakout) I am interested.
That would allow me to test quickly with different processors.

Is it OK with you if I send it without the headers soldered in place (can send headers), though, to keep the package slim and reduce the risk of breakage?

Perfect! Thanks
(email is in the json file)

Update of the numbers (very similar, minor increase in read() )

Sketch: testFRAMPerformance.ino
Platform: UNO
IDE: 1.18.19

Version UNO size Notes
0.3.4 n.a.
0.3.5 7540
0.3.6 7540
0.4.0 7650 develop branch

I2C: 100.000

Version write 1200 bytes us / byte
0.3.4 136704 113.92
0.3.5 136704 113.92
0.3.6 136704 113.92
0.4.0 136592 113.83
Version read 1200 bytes us / byte
0.3.4 141708 118.09
0.3.5 141760 118.13
0.3.6 141760 118.13
0.4.0 141984 118.32

Hi,

A version of the simple test sketch I sent you before now confirms that clear() works for the whole chip, and also that I can write and read both blocks without any overwriting!

Sorry if I missed it, but I don't think I have your address to send the chip for future testing.

Michael

(email is in the json file)

I'm going to make a derived class called FRAM32 that uses 32 bit memory addresses.

That way the FRAM class stays ~100 bytes smaller for all that can be addressed with 16 bit only.
Note that the FRAM32 also can read / write the 16 bit FRAM devices - with some overhead

@mbmorrissey
Derived class FRAM32 pushed to develop branch, please verify.

@GitMoDu
The FRAM class is still using 16 bit addresses so no increased footprint,

the FRAM32 class seems to work great!

the FRAM32 class seems to work great!

thanks,
Do you think the name is OK, or ideas for a better one?
Or does the readme need more info?

Still open is to be able to read/write over the 64 KB boundary e.g.

_writeBlock(65530, buffer, 10);

maybe for the 0.4.1 version ?

Hi,

First - slight oversight in my previous test. I just noticed that clear() doesn't fully work in the current FRAM32. I know little about inheritance, so I don't know if it is the best solution and there might be a better way, but I gave FRAM32 its own clear() function and it works fine.

I can't think of a better name than FRAM32. I think your readme contains the vital information, with the note about using FRAM32 in the first table where the MB85RC1MT is referenced. Maybe with a slightly more illustrative example file, it could be useful. Code below for a slightly more commented version of the testing file I've been using, in case that is any use.

re. writing across the blocks: Might it make sense to tackle this at the same time as your first "medium priority" point about checking for over-runs more generally?

When checking for writing across the blocks, or more generally over-runs, would it be possible to have a check=FALSE option set up in such a way that the compiler would skip anything irrelevant, avoiding the overhead when someone really wants to optimise for speed/efficiency?

Michael

#include "FRAM.h"
#include <Wire.h>

// using FRAM32 derived class to access both blocks of
// MB85RC1MT IC
FRAM32 fram;


void setup() {

  Serial.begin(9600);
  Wire.begin();
  delay(400); 
  
  Serial.println("");
  Serial.println("");
  Serial.println("***************");

  int rv = fram.begin(0x50);
  if (rv != 0)
  {
    Serial.print("INIT ERROR: ");
    Serial.println(rv);
  }

  Serial.print("ManufacturerID: ");
  Serial.println(fram.getManufacturerID());
  Serial.print("     ProductID: ");
  Serial.println(fram.getProductID());
  Serial.print("    memory size: ");
  Serial.println(fram.getSize());

  Serial.println("***************");

  Serial.print("Clearing IC (setting all to 0)...");
  fram.clear();
  Serial.println("done.");

  // show that both blocks have been cleared
  byte test;
  Serial.println();
  Serial.println("Reading from first block:");
  for(uint32_t i = 0; i < 4; i++){
    Serial.print(i);
    Serial.print("\t");
    test =  fram.read8(i);
    Serial.println(test); 
  }
  
  Serial.println();
  Serial.println("Reading from second block:");
  for(uint32_t i = 65536; i < 65540; i++){
    Serial.print(i);
    Serial.print("\t");
    test =  fram.read8(i);
    Serial.println(test); 
  }  

  Serial.println();
  Serial.print("Clearing IC (setting all to 0)...");
  fram.clear(255);
  Serial.println("done.");

  // show that both blocks have been cleared
  Serial.println();
  Serial.println("Reading from first block:");
  for(uint32_t i = 0; i < 4; i++){
    Serial.print(i);
    Serial.print("\t");
    test =  fram.read8(i);
    Serial.println(test); 
  }
  
  Serial.println();
  Serial.println("Reading from second block:");
  for(uint32_t i = 65536; i < 65540; i++){
    Serial.print(i);
    Serial.print("\t");
    test =  fram.read8(i);
    Serial.println(test); 
  } 
  
  Serial.println();
  Serial.println("***************");
  Serial.println();

  // corresponding addresses in both blocks of the
  // MB85RC1MT IC.  Values are first address in each
  // block, which allows subsequent code to confirm
  // that neither is being over-written
  uint32_t addr1 = 0;
  uint32_t addr2 = 65536;

  // contrasting data to write
  byte dat1 = 2;
  byte dat2 = 4;
  
  delay(50);
  Serial.print("writing "); Serial.print(dat1); 
  Serial.print(" to memory address ");  Serial.println(addr1);
  fram.write8(addr1, dat1);

  delay(50);
  Serial.print("reading from memory address ");  Serial.println(addr1);
  test =  fram.read8(addr1);
  Serial.println(test); 

  delay(50);
  Serial.print("writing "); Serial.print(dat2); 
  Serial.print(" to memory address ");  Serial.println(addr2);
  fram.write8(addr2, dat2);
  
  delay(50);
  Serial.print("reading from memory address ");  Serial.println(addr2);
  test = 255;
  test =  fram.read8(addr2);
  Serial.println(test); 

  delay(50);
  Serial.print("reading from memory address ");  Serial.println(addr1);
  test =  fram.read8(addr1);
  Serial.println(test); 

  Serial.println();
  Serial.println("Reading from first block:");
  for(uint32_t i = 0; i < 4; i++){
    Serial.print(i);
    Serial.print("\t");
    test =  fram.read8(i);
    Serial.println(test); 
  }
  
  Serial.println();
  Serial.println("Reading from second block:");
  for(uint32_t i = 65536; i < 65540; i++){
    Serial.print(i);
    Serial.print("\t");
    test =  fram.read8(i);
    Serial.println(test); 
  }
  
}

void loop() {
}

I just noticed that clear() doesn't fully work in the current FRAM32.

Can you tell what is wrong? does it skip mem addresses or ??

think i see it,

clear() is inherited from FRAM and calls the FRAM::_writeBlock iso FRAM32::_writeBlock

pushed new code with FRAM32::clear(),
it is roughly 2x slower, think about adding a "clear32" example


performance is roughly equal after some more testing

Hi Rob,

Here are test results from this morning with FRAM32.

First the timing tests outputs from FRAM32_MB85RC1MT_test and FRAM32_Performance. I’m not sure I ran these recently, so I’m sorry I can’t comment on whether anything changed in the most recent version (is the fat that there is no improvement about 200000 clock speed, and it gets really bad at 800000 of note?):

8:18:48.693 -> FRAM_LIB_VERSION: 0.4.0
08:18:48.693 ->
08:18:48.693 ->
08:18:48.693 -> CLOCK: 100000
08:18:48.877 -> WRITE 1200 bytes TIME: 154865 us ==> 129.05 us/byte.
08:18:49.128 -> READ 1200 bytes TIME: 165386 us ==> 137.82 us/byte.
08:18:49.228 ->
08:18:49.228 -> CLOCK: 200000
08:18:49.343 -> WRITE 1200 bytes TIME: 74471 us ==> 62.06 us/byte.
08:18:49.529 -> READ 1200 bytes TIME: 81365 us ==> 67.80 us/byte.
08:18:49.605 ->
08:18:49.605 -> CLOCK: 300000
08:18:49.681 -> WRITE 1200 bytes TIME: 74476 us ==> 62.06 us/byte.
08:18:49.900 -> READ 1200 bytes TIME: 81366 us ==> 67.80 us/byte.
08:18:50.005 ->
08:18:50.005 -> CLOCK: 400000
08:18:50.081 -> WRITE 1200 bytes TIME: 74476 us ==> 62.06 us/byte.
08:18:50.266 -> READ 1200 bytes TIME: 81282 us ==> 67.73 us/byte.
08:18:50.369 ->
08:18:50.369 -> CLOCK: 500000
08:18:50.436 -> WRITE 1200 bytes TIME: 74464 us ==> 62.05 us/byte.
08:18:50.621 -> READ 1200 bytes TIME: 81379 us ==> 67.82 us/byte.
08:18:50.733 ->
08:18:50.733 -> CLOCK: 600000
08:18:50.832 -> WRITE 1200 bytes TIME: 74492 us ==> 62.08 us/byte.
08:18:51.018 -> READ 1200 bytes TIME: 81319 us ==> 67.77 us/byte.
08:18:51.127 ->
08:18:51.127 -> CLOCK: 700000
08:18:51.198 -> WRITE 1200 bytes TIME: 74476 us ==> 62.06 us/byte.
08:18:51.377 -> READ 1200 bytes TIME: 81356 us ==> 67.80 us/byte.
08:18:51.487 ->
08:18:51.487 -> CLOCK: 800000
08:18:53.161 -> WRITE 1200 bytes TIME: 1633564 us ==> 1361.30 us/byte.
08:18:54.986 -> READ 1200 bytes TIME: 1746197 us ==> 1455.16 us/byte.
08:18:55.091 ->
08:18:55.091 -> done...

08:21:17.005 -> FRAM_LIB_VERSION: 0.4.0
08:21:17.043 -> BYTES : 1024
08:21:17.114 ->
08:21:17.114 -> SPEED : 100000
08:21:17.663 -> BYTES 1: 529733 ==> 16.17 us/byte
08:21:18.088 -> BYTES 2: 320026 ==> 9.77 us/byte
08:21:18.403 -> BYTES 4: 217621 ==> 6.64 us/byte
08:21:18.661 -> BYTES 8: 166424 ==> 5.08 us/byte
08:21:18.904 -> BYTES 16: 140834 ==> 4.30 us/byte
08:21:19.152 -> CLEAR(): 140869 ==> 4.30 us/byte
08:21:19.402 -> CLEAR(0xFF): 140874 ==> 4.30 us/byte
08:21:19.514 ->
08:21:19.514 -> SPEED : 200000
08:21:19.801 -> BYTES 1: 267322 ==> 8.16 us/byte
08:21:20.049 -> BYTES 2: 161709 ==> 4.93 us/byte
08:21:20.271 -> BYTES 4: 108332 ==> 3.31 us/byte
08:21:20.449 -> BYTES 8: 81731 ==> 2.49 us/byte
08:21:20.632 -> BYTES 16: 68284 ==> 2.08 us/byte
08:21:20.809 -> CLEAR(): 68237 ==> 2.08 us/byte
08:21:20.961 -> CLEAR(0xFF): 68269 ==> 2.08 us/byte
08:21:21.074 ->
08:21:21.074 -> SPEED : 400000
08:21:21.333 -> BYTES 1: 267277 ==> 8.16 us/byte
08:21:21.618 -> BYTES 2: 161724 ==> 4.94 us/byte
08:21:21.834 -> BYTES 4: 108328 ==> 3.31 us/byte
08:21:22.013 -> BYTES 8: 81728 ==> 2.49 us/byte
08:21:22.187 -> BYTES 16: 68303 ==> 2.08 us/byte
08:21:22.371 -> CLEAR(): 68234 ==> 2.08 us/byte
08:21:22.522 -> CLEAR(0xFF): 68266 ==> 2.08 us/byte
08:21:22.633 ->
08:21:22.633 -> SPEED : 800000
08:21:28.482 -> BYTES 1: 5849142 ==> 178.50 us/byte
08:21:32.147 -> BYTES 2: 3524650 ==> 107.56 us/byte
08:21:34.607 -> BYTES 4: 2362580 ==> 72.10 us/byte
08:21:36.511 -> BYTES 8: 1781695 ==> 54.37 us/byte
08:21:38.100 -> BYTES 16: 1491146 ==> 45.51 us/byte
08:21:39.702 -> CLEAR(): 1491232 ==> 45.51 us/byte
08:21:41.328 -> CLEAR(0xFF): 1491158 ==> 45.51 us/byte
08:21:41.435 -> done...

I also ran the sleep example, but modified to use FRAM32. There is an approx 750 uA peak (approx 50 uS) on wakeup, then average power consumption during clear() is about 24 uA, idle is about 11.7 uA, and sleep current is about 3.6 uA.

Michael

Which processor/board did you use for the test?

800K is the highest speed that an UNO can do but I had UNO's that failed on it, so it is on the edge of the possible.
(just 20 us to handle a single bit)

I was using my own board with an ATMega 4808 and MCUDude's MegaCoreX.

I could try on a UNO, except I gather from the absolute maximums in the datasheet that the MB85RC1MT isn't 5V tolerant and I don't have a level shifter.

Michael

oh, and I am running my 4808 at just 4 MHz, not sure if that would be a factor...

oh, and I am running my 4808 at just 4 MHz, not sure if that would be a factor...

4 MHz cannot handle 800 Kb, already super it handles 400 Kb correctly (assumption as there is no verification)
I would not advice any speed above 200 Kb on a 4 MHz system.

I could try on a UNO, except I gather from the absolute maximums in the datasheet that the MB85RC1MT isn't 5V tolerant and I don't have a level shifter.

Don't risk hardware that way,

OK - so all looks promising for now. I just put the IC in the post (had to wait for today to get the customs declaration from our secretary...if not for brexit you'd probably have it by now!) Do you have a 3.3v arduino that you'll be able to use for testing? If not, I may be able to ask around here. Michael

Yeah, Brexit fun, no not arrived yet,

For the 3V3 I have some ESP32's around, so that will work.

With my 4808 board running at 16 MHz (possibly flirting with the upper limit for 3.3V) I get much nicer results at 800 Kb:

09:39:49.458 -> CLOCK: 100000
09:39:49.561 -> WRITE 1200 bytes TIME: 124172 us ==> 103.48 us/byte.
09:39:49.815 -> READ 1200 bytes TIME: 132341 us ==> 110.28 us/byte.
09:39:49.923 ->
09:39:49.923 -> CLOCK: 200000
09:39:49.956 -> WRITE 1200 bytes TIME: 40972 us ==> 34.14 us/byte.
09:39:50.102 -> READ 1200 bytes TIME: 43836 us ==> 36.53 us/byte.
09:39:50.209 ->
09:39:50.209 -> CLOCK: 300000
09:39:50.245 -> WRITE 1200 bytes TIME: 40972 us ==> 34.14 us/byte.
09:39:50.394 -> READ 1200 bytes TIME: 43839 us ==> 36.53 us/byte.
09:39:50.504 ->
09:39:50.504 -> CLOCK: 400000
09:39:50.541 -> WRITE 1200 bytes TIME: 40977 us ==> 34.15 us/byte.
09:39:50.684 -> READ 1200 bytes TIME: 43841 us ==> 36.53 us/byte.
09:39:50.790 ->
09:39:50.790 -> CLOCK: 500000
09:39:50.825 -> WRITE 1200 bytes TIME: 40972 us ==> 34.14 us/byte.
09:39:50.972 -> READ 1200 bytes TIME: 43840 us ==> 36.53 us/byte.
09:39:51.078 ->
09:39:51.078 -> CLOCK: 600000
09:39:51.114 -> WRITE 1200 bytes TIME: 40978 us ==> 34.15 us/byte.
09:39:51.256 -> READ 1200 bytes TIME: 43835 us ==> 36.53 us/byte.
09:39:51.364 ->
09:39:51.364 -> CLOCK: 700000
09:39:51.402 -> WRITE 1200 bytes TIME: 40972 us ==> 34.14 us/byte.
09:39:51.553 -> READ 1200 bytes TIME: 43837 us ==> 36.53 us/byte.
09:39:51.663 ->
09:39:51.663 -> CLOCK: 800000
09:39:51.663 -> WRITE 1200 bytes TIME: 24153 us ==> 20.13 us/byte.
09:39:51.800 -> READ 1200 bytes TIME: 26115 us ==> 21.76 us/byte.
09:39:51.903 ->
09:39:51.903 -> done...

Looks suspicious,

  • all speeds from 200-700 Kb have roughly the same timing.
  • at 200 Kb you are 3x as fast as at 100 Kb
  • 800 Kb speeds look OK now.

something is not right @ 16 MHz

I've gone digging around in MegaCoreX, and found this:

void TWI_MasterSetBaud(uint32_t frequency)
{
  // Formula is: BAUD = ((F_CLKPER/frequency) - F_CLKPER*T_RISE - 10)/2;
  // Where T_RISE varies depending on operating frequency...
  // From 1617 DS: 1000ns @ 100kHz / 300ns @ 400kHz / 120ns @ 1MHz

  uint16_t t_rise;
  uint16_t freq_khz = frequency / 1000;

  if (freq_khz < 200)
  {
    freq_khz = 100;
    t_rise = 1000;
  }
  else if (freq_khz < 800)
  {
    freq_khz = 400;
    t_rise = 300;
  }
  else if (freq_khz < 1200)
  {
    freq_khz = 1000;
    t_rise = 120;
  }
  else
  {
    freq_khz = 100;
    t_rise = 1000;
  }

  uint32_t baud = ((F_CPU / 1000 / freq_khz) - (((F_CPU * t_rise) / 1000) / 1000) / 1000 - 10) / 2;
  TWI0.MBAUD = (uint8_t)baud;
}

at

https://github.com/MCUdude/MegaCoreX/blob/master/megaavr/libraries/Wire/src/utility/twi.c

Which I think might explain why there are jumps in performance at 200 and again at 800. If I'm interpreting this right, it means the reason for the step function in performance I'm reporting doesn't at least indicate anythign wrong with FRAM_I2C.

Explains it all, function could be expanded although the value of t_rise should be calculated
Formula seems to be something like t_rise = 110 K / frequency.

Hi Rob,

Just to check where we are at in terms of what I can contribute at this point to testing. Were you suggesting modifying that function within MegaCoreX so as to be able to test the full continuous range, or do you think we can be pretty satisfied no that we know why we got that step function?

Michael

No need to test with that function, for me it is clear that it causes the step function. It might be useful for you in the future as it allows to tweak the speed for any I2C (other) device. For the FRAM lib point of view it would not add anything.

The future of the FRAM library lies in finding applications that can use the strength of FRAM e.g. a circular logging or as a datastore for strings like the F-macro does for PROGMEM. Placing font data for displays perhaps, a lookup table for complex functions, etc. If FRAM can takeover PROGMEM functions there will be more space for pure code.
Maybe even porting a minimal filesystem to the FRAM, or a way to merge multi-FRAM as one ...

Think it will be time soon to squash and merge the 0.4.0 branch into master. Agree?

I agree. It seems ready to me. For my purposes, your recent additions are really great.

In case it is of interest to you, my interest in using FRAM is rather more basic. I'm interested in very low power data logging. Mostly, using a SD card is fine, but not always best. Also, I sometimes need logging at temperatures that are outside (well below) SD card specs (and then transferring to a SD card when things warm up). A minimal file system could be very useful, but for my purposes organising the data log myself is entirely feasible (I think).

Michael

Low power logging can be "high tech", I recall reading some story about measuring the energy needed to compress and store versus raw storage. (Mechanical disc era).
Relative simple compression like run length or delta compression took little energy but gained relative a lot. less bytes to store was less movement of the disc. Depends also on the data of course.
less energy spend would still mean longer battery life etc

@mbmorrissey
Merged the develop branch into master, CI build is running

@mbmorrissey
Got a package this morning, thanks