ReservedField/evic-sdk

Custom firmware data flash handling

Closed this issue · 13 comments

In the last few days I looked into getting data flash support for CFWs. I'd like to get feedback from other devs as the two strategies I thought of either waste ROM space or are a total nightmare to implement. I will begin explaining how the data flash works and how it is managed by the OFW (official firmware). This way I'll also have something to copypaste into the dataflash docs when I write them... Warning: long post.

The MCU used in the eVic has a 128kB flash for storing APROM and data. The APROM is always stored at the beginning of the flash, mapped at address zero, and can be updated from LDROM/bootloader. The data flash portion extends from the data flash base address (DFBA) to the end of flash, and can be updated with simple FMC commands. Joyetech sets the DFBA to 0x1E000, so APROM code space is 120kB and data flash is 8kB.

Flash is divided into 2kB pages. This means that the smallest erase size possible is 2kB. Erased flash bits are 1, programmed bits are 0. So 1 bits can be always flipped to zero, but flipping a 0 to 1 requires erasing the whole page. Smallest read/write size is 4 bytes.

The 8kB of data flash are thus divided into 4 pages (let's number them from 0 to 3). They are used by the OFW in a wear-leveling scheme. The actual data flash info is 256 bytes long (the one we get from USB HID is bigger, because it's not a 1-to-1 copy of the real data flash). Pages 0 and 1 are split into 256-byte chunks. Starting with erased pages, each time the info is updated it is written to the next chunk, without erasing the pages. Once the firmware runs out of space, it erases the pages and starts back again. This saves erase cycles, erasing each page only once in 4096 / 256 = 16 updates. Because erased dataflash is just a bunch of FF hex bytes, and the first two words of info can't be both full of FF bytes (we'll see later why), finding the most recent info structure is as simple as reading 8 bytes from the start of each 256-byte chunk, and comparing them with 8 FF bytes. If they match, then that chunk is erased memory, i.e. it's after the most recent one.

Pages 2 and 3 are used for puff/time counts. Puff count and puff time are both 4-byte values. The same wear-leveling method is used, with page 2 storing 512 puff count values and page 3 storing 512 puff time values. Since puffs/time will never be 0xFFFFFFFF, looking for this value will find the end of the most recent count/time. Because the two values grow at the same rate, since they are always both saved at the same time, the firmware finds the number of values in page 2 and uses that also for page 3, assuming it to be the same. This will cause some problems to us.

About the dataflash structure, we are mostly interested in the first 12 bytes:

| Offset |   Size  |       Description       |
|--------|---------|-------------------------|
|   +0   | 4 bytes | Data flash CRC checksum |
|   +4   | 4 bytes |     Hardware version    |
|   +8   | 4 bytes |    Data flash version   |

(while CRC can be anything, hardware version can't be a bunch of FFs - that's why looking for 8 FF bytes works)
Once the firmware is booted, it reads the most recent data flash info and computes its CRC checksum. If it doesn't match the stored checksum, of if the data flash version doesn't match a firmware-dependent value, the data flash info is reset to defaults and written to a new block. The only information that is retained is the hardware version, since it can't be determined by the chip.

With CFWs, I want the programmer to be able to define data flash structures, their size and how many pages to use for each of them as wear leveling. This allows optimal use (i.e. big, not so frequently updated data in one page, another page just for very small but frequently updated data, etc.). The SDK will handle erasing, writing and wear leveling.

If we use the same data flash for CFWs, we want the OFW to reset it if it's flashed, since the CFW will most likely have a different data flash format. This means providing an intentionally invalid checksum, a correct hardware version and, optionally, an invalid data flash version at the beginning of every 256-byte chunk. This can work if the CFW data flash structure is at most 244 bytes (or 248, if neglecting the version field). If it's bigger, it becomes harder to implement and possibly inefficent. Also, it's easy for a size to evenly divide 2048 to get maximum density for wear leveling, while it's more difficult for it to divide 244. Another problem is that page 2 must grow at the same rate, or faster, than page 3, otherwise the OFW won't be able to store puff time correctly since it'll try to to write over non-erased data.

The other solution would be remapping the DFBA, leaving the Joyetech data flash intact and reserving our own space for our own data. The OFW hardcodes the 0x1E000 base address, and always remaps the data flash to that, so this would work fine. The problem is that it wastes 8kB of space. With the latest size optimizations the atomizer sample is just shy of 24kB. Many CFWs could fit into much less than 120kB. By leaving Joye's flash untouched we also guard against them changing the format (even though they're probably done with the eVic).

Another issue common to both strategies is what happens if a CFW is flashed and sets the data flash up a certain way, and then another one with a different data flash layout is flashed. If no versioning or way to distinguish them is adopted, this will lead to crashes. I propose two ways to version data flash:

  • Add a 4-byte header, consisting of a 11-bit structure size and 21-bit magic/version unique number, before each and every structure block in the flash. The size field allows easy traversal even when different blocks are there, allowing a CFW to take up space left over from other CFWs and really minimize erase cycles. The drawback is that this works well for big structures, but has significant overhead for small data sizes (e.g., a 32-bit puff counter would double in size).
  • Add a 4-byte header, consisting of a 32-bit magic/version unique number, to the beginning of data or some other suitable location. If the magic value doesn't match the CFW one, the flash is erased and re-initialized by the CFW. This erases the flash each time a different CFW is flashed, but has much less overhead.

What do others think? What would be the most elegant and efficient way to implement this?

ploop commented

As far I understand with my bad english, you want to keep compatibility with OFW? Does this make sense? We only need to hardware version of OFW, settings will be different, puff/time is no important information.
I think we can take old address without touching header, and apply custom wear-leveling logic.

No, I don't want to be field-by-field compatible - it would limit the programmer's freedom too much. I don't want to crash the OFW, though. I will probably just remap it, losing 8kB is not much. Using the OFW data flash is a nightmare when it comes to implementation (how are you gonna handle big structures? Or achieve maximum packing? You'd have to break the struct between blocks... And then pages 2&3 are an headache).
What I still have to figure out is how I'm going to version the CFW dataflash so that flashing different CFWs works seamlessly.

Another issue I found is that LDROM doesn't hardcode the data flash base address. So if I remap the DFBA, LDROM is going to load the custom data. For the most part this is not an issue, but if the byte corresponding to the boot flag is 1 the device will get stuck on LDROM. So I'd have to shadow the bootflag, which means breaking up the user's custom structure, which is a pain.
Fortunately, it looks like I can enable APROM updating from APROM and write to the flash without changing the base address. Here's the approach I propose.

The custom data flash (CDF) starts at DFBA and grows towards lower memory addresses (towards the APROM code). The linker will export a symbol equal to the address of APROM end, for collision checking purposes. There will be a maximum size for the CDF. It should be big enough to allow for effective wear leveling, while being small enough that it won't collide with most APROMs. Something in the 32-64kB range should do.

The CDF begins with a 8-byte header:

4 bytes | Number of pages currently used
4 bytes | Head page number

The values are encoded as the number of 0 bits in the field. Adding more zeroes doesn't require erasing the page. 32 bits mean that at most 64kB can be used, which is consistent with the maximum size range. The number of pages used is how many pages of CDF have been used in the past. The head is the number of the page that is currently being used. After this small header, data structures are stored sequentially.

Every data structure stored in the CDF is prefixed by a 8-byte header:

4 bytes | Magic (24 bits), size (8 bits)
4 bytes | Number of occurences
... user data ...

The magic must be unique for every kind of data structure and CFW. Size is expressed in words (8 bits = max 1kB). The 24/8 bits can be revised (maybe giving more magic bits to avoid collisions, or more size bits to pack non-word-aligned structs). Those 4 bytes must not be 0xFFFFFFFF, so that uninitialized space can be found. The number of 0 bits in the occurences field is equal to the number of times the same kind of structure is written contiguosly. Since there will always be at least one, the number of 0 bits is actually number of occurences - 1. In the worst case (size = 4 bytes), not having this field gives a fixed 100% overhead. With the occurences field, we get a worst-case overhead of 300% (one occurence) and a best-case overhead of 6% (33 occurences). We care about overhead for structs that are updated often, and if a struct is updated often it's likely that there won't often be other structs between consecutive updates, especially if we set a guideline that high update frequency data should be consolidated to a single struct.

The user provides a list of magic values used by the CFW before using the CDF routines.

When the user writes/updates a structure, it is simply written to the current position in the CDF (if the previous struct had the same magic and the number of occurences is less than 33, the header is omitted and the number of occurences is increased). Once end of CDF is reached (its size is the minimum between the 32-64kB value and 120kB - APROM size), head wraps around and first page is erased. Once first page runs out of space, second one is erased, and so on.

The erase procedure is as follows. First, if the page is the first one, special care must be taken to restore the CDF header. Then, data in the page and in the previous/next one is scanned to check which data structures are in the page. The magics are checked against the list provided by the user. If a struct has a good magic, it is checked whether it's the latest version of the struct. If the page has just old structs, it can be simply erased. Otherwise, the most recent structs must be copied and restored after the erase.

On boot, the driver will first check the number of used pages and compare it with the APROM end from the linker. If the APROM goes beyond the last used page, then CDF integrity can't be ensured and a full reset (to zero used pages) must be performed (this is the reason why maximum size shouldn't be too big). This can be improved if the head page is before the APROM end: check if the latest version of all structs are between CDF start and head. If so, then integrity can be guaranteed and no reset is necessary.
After this, the CDF is searched for the most recent version of every struct. Thanks to the struct header, and to the magic+size not being 0xFFFFFFFF, it is easy to traverse the whole CDF and stop when appropriate.

This should be efficent and without too much overhead. It doesn't cause any problems with OFW or LDROM. It is robust when different CFWs are flashed, and doesn't waste erase cycles in those cases. It also uses much more space for wear leveling, putting unused APROM space to good use.

I haven't mentioned checksum, because they add significant size overhead. The SDK will provide utility functions to calculate and handle checksums, but deciding where to put them is going to be the user's choice (or we can make a "main" struct that will be transparently checksummed). Checksums don't belong to this abstraction level.

Thoughts on this? Improvements?

ploop commented

Very difficult :)
Can you make write during a power off? To do this, need check circuit, find capacitor on power line. Then wear leveling greatly simplified.

We can take one page (or more, if need), make structure with finishing zero byte, for example 64 bytes, and during boot read this structure until we find 0xFF or out of page. If structure has magic number - it is a last session - take it. And then in loop.
If we use the write when power off, it will be very rare. And page need erase when 2K/structure count occur.

I suppose you mean battery removal/discharge when you talk about poweroff - the 5 click poweroff is just a sleep mode (writing when going to sleep would be trivial).
I'd have to check with a multimeter, but judging from my board photos there's not enough capacitance on the power line to keep the MCU powered while writing/erasing flash (erasing takes a bit of current, a large voltage has to be generated). Since it's battery powered, they didn't filter the supply much, most of it is just small bypass caps for clock, fast signals, etc.

The main point about my scheme is not wear leveling - that is mostly trivial, even if done often. I'll try to explain better.
Imagine you flash a CFW. It creates its own structures. Now you flash a different CFW, which uses different structures. How do you keep it from reading the data from the older firmware, thinking it's its own, and crashing?
The easiest way would be a "master" magic that identifies the specific CFW (and its version, if structs vary across versions). If it doesn't match, you erase the whole data flash and reinitialize it. It works, it's simple, but it wastes erase cycles. Smarter way would be to recognize the old structs and skip over them. This can be done by adding a size header to the data (and my occurence count is just an optimization to reduce overhead).

What you suggest is basically the same as my method, apart from the fact that mine has more optimizations (and as such is more complex) and supports structs of any size, not only fixed (64 bytes in your example). If structure size is fixed it becomes easier, I agree, but then it's quite user-unfriendly (what if I need a big "main" struct, and small 4-byte structs for puff counts/time, and some other struct for custom temperature control curves?). The complex things about implementing this don't change whether you are writing every time a change is made or "bundle" it during poweroff.

Expanding more on the size, imagine a 256 bytes struct. You can fit 8 of them in a 2kB page. Let's pretend we give it only a page for wear leveling. This means we erase the page every 8 writes. Now, imagine I want to store a 4-byte puff counter (or some other value that is frequently updated). If I put it in the big struct, I have to erase every 8 puffs. If I give it an independent page, I have to erase only once every 256 puffs. This is why multiple, variable size structs are good :)

Now the question I have is: is it worth it to try saving cycles when switching between different CFWs? If not, things may be simpler.

If we don't care about erasing the flash between different CFWs, I propose this.

Let's number the pages starting from 0. At the beginning of page 0 there is a small header with a magic value that identifies the specific CFW and version. At boot, if the magic doesn't match, the flash is erased and re-initialized.
The CFW declares its structures to the SDK. Let's say it has N structures. Then, the first struct gets pages 0, N, 2N, ..., the second struct gets 1, N+1, 2N+1, ... and so on. Last session is easy to find and we don't have to worry about unpredictable older sizes.
An improvement could be having one magic per structure, so that if, for example, 1 out of 3 structs change in a CFW update only that one needs to be erased.

This is actually pretty easy, I'm more and more convinced that saving cycles between different CFWs is not that useful.

ploop commented

I suppose you mean battery removal/discharge

Yes, of course

Now the question I have is: is it worth it to try saving cycles when switching between different CFWs? If not, things may be simpler.

It looks good, if you can implement.

mpaw commented

@ReservedField Just found this project. Had no idea so much progress was made in reverse engineering the eVic VTC Mini. Very nice work.

Regarding this thread... I think it's a better idea (and a great reduction of complexity) to just erase the old firmware if you detect a mismatch in the structures between the firmwares. The erase cycles that you save are not worth the effort or logic of implementing a more robust solution. Sometimes simpler is better.

Custom dataflash is in commit 812c4e6. I'll keep this issue open until we're sure it's stable.

cbRoy commented

I've been thinking about this since I found this repo! I had no idea it would be this involved. I kept reading from df and displaying hoping to see something I recognized, but had no idea really where I was headed. I'm new to embedded programming and it's been a while since I've tinkered with c so I was almost completely lost.

I do agree with your last statement that we shouldn't care about persisting data between cfw. The only thing I think that would help with isbif updating the exact same firmware, in which the magics plus a version would help.

Unless you think people really want to the puff/time counter I wouldn't bother with persistence.

They only thing I was trying to persists between flashes and battery pulls was the wattage!

Either way I'm excited to play with your working dataflash implementation!

Keep up the good work!

I had no idea it would be this involved.

Well, wear leaving is as simple as a circular buffer. Doing that for multiple, general-purpose, upgradable structs is where it gets tricky :)

Like I did before, I'm writing up the dataflash format so my code is a bit more understandable.

CDF (custom dataflash) is made, at the moment, of 8 pages totalling 16kB. The ROM is split between 120kB for APROM and 8kB for OFW dataflash. We don't touch OFW dataflash and their DFBA, so the CDF is located at the end of the 120kB (i.e. end of CDF = start of OFW dataflash).

Each page starts with a 4-byte magic word, made up like this (bits below):

| MRP | MRU | Reserved | Magic |
  31    30    29-24      23-0

Each page stores updates for a single structure type. The structure type is identified by the 24-bit magic, which I assign ranges of to devs. There are 6 reserved bits because I had space and they may turn out useful. The MRU bit stands for Most Recently Used. When set, it indicates that page was the last to be erased in the previous runs. The MRP bit is for Most Recent Page. When set, it indicates that page is the one with the most recent updates for that struct type. Note that those bits are initially set when a page is erased (which is exactly what we want) and can be unset when we like (since 1 -> 0 is always possible, while 0 -> 1 requires erasing the page).

The rest of the page is divided into "blocks". Each block starts with a 4-byte counter word. Its numerical value is equal to the number of trailing zero bits + 1. This means it can count 1-33, going from 1 when erased to 33 when completely programmed to zero. This kind of unary counter can be updated without erasing (again because we can do 1 -> 0). This counter represents the number of updates in the block. Once a block runs out of the 33 updates, a new one can be created after it (if there's space).

During initialization the CDF is scanned. If a page is already completely erased, it is internally marked as clean. If a page has the MRP bit not set or the magic is not known to the CFW it is marked as stale, i.e. it can be erased if needed. If a page has MRP set and a valid magic, it's marked as good. This is actually broken up into two stages (between Dataflash_Init and Dataflash_SelectStructSet) to allow CFWs to easily upgrade old structure versions. Also, the page with the MRU bit set is noted down. In a well-formed CDF there is only one page with MRU set (if there are more, we just use the last one, any guess is as good as any other).

Reading a struct is just a matter of looking up the page for that magic with the MRP bit set. Then we follow the blocks until we get at the last one, and we read the last struct off it. Note that this requires knowledge of the struct size, that's why size is asked to the CFW. Sizes are internally aligned to 4 bytes to simplify flash operations.

Writing a struct goes like this. First, we look up the MRP page for that struct (if it exists), then we follow the blocks in there. If the last block has counter < 33 and there's space in the page for another update, we just write the update and increase the counter. If the last block has counter == 33 but there's space for another block (4 bytes counter) + struct update, we create a new block with counter = 1 (i.e. just erased space) and write the update. If there's no space or if there's no page for that struct a new page needs to be created. If a clean page is available, it is selected. Otherwise, we start from the MRU page, find the first stale page after it and erase that. Now that we have an erased page we create the magic word (with MRP and MRU set), the first block and we write the data. The MRP is cleared from the old page (if there was one) and the MRU is cleared from the previous MRU page. There's an exception to this: pages created from already clean pages do not have the MRU set and do not unset the old MRU.

This format is very good because:

  • Strikes a good balance between complexity, overhead and performance. The implementation isn't that complicated;
  • Pages are self-contained and there's no global header. This means all pages are treated equally (unlike a global header when one page has it and the other haven't), which simplifies implementation, and no data has to be moved around on erase (unlike my first proposal which was a nightmare);
  • Structures take up pages proportionally to their needs. This maximizes efficiency, compared to the sub-optimal results of allocating a fixed number of pages per struct (be it automatic or programmer-defined, unless the programmer can predict exactly the update frequency for each struct);
  • It doesn't rely on 0xFFs in structs to determine update count (common strategy, also used by Joyetech: avoids the need for counters, but it's a nightmare when designing for general-purpose);
  • Each struct has its own magic, so upgrading one of them doesn't mean throwing all of the others away;
  • If not too many updates are done by a CFW, it persists data from older CFWs (I didn't really work towards this, but it's a positive side-effect of this design).

I wrote an FMC emulation layer that allowed me to compile CDF code as a native executable for my machine and actually debug it with decent tools. I did some benchmarking and (as expected) the erase count for each page is perfectly leveled.

I found a (pretty obvious, in hindsight) bug. Since counter starts at 1 and a maximum of 33 structs are stored per block, when counter is 33 it's impossible to know if that block is the most recent one or not (all 1s in the following block are a valid counter with value 1). The read code always advanced to next block on 33, while the update code stored up to 33 structs. This caused garbage to be read occasionally (every 33 updates, more or less).

I fixed it by storing at most 32 updates per block, and using 33 as a marker of an exhausted block with another block following, so:

  • count = 32: final block with 32 updates;
  • count = 33: block with 32 updates, followed by a more recent block.

This is fixed as of 29f9966 (with the bulk of the fix being in ba56f5b, I forgot to add the reduced update count to the first commit). Erase your dataflash.

CDF seems stable, only one bug was found and fixed. I'm closing this.