bztsrc/raspi3-tutorial

Using sd_readblock() for reading directories

manncr opened this issue · 9 comments

So I'm trying to read the contents of a directory and am clearly using sd_readblock wrong. I have the cluster that the information is at, just not sure what to put into sd_readblock() to get it to return the correct information. I know this is outside the scope of this whole guide/tutorial, but I thought I'd ask here anyway. I apologize if it's the wrong place to ask.

Thanks!

Don't you worry.

The sd_readblock() requires a Linear Block Address (LBA) as parameter. You can get that from a FAT file system by converting it's cluster number. For example, this calculation is done here.

The steps are a little bit different for FAT16 (which has a fixed root directory) and FAT32 (which has a variable cluster for the root directory). But for both, I calculate data_sec, which is going to be the LBA of the first data sector. That correlates to cluster 2. Now to get the LBA for an arbitrary cluster, you need two variables: the start of the first data sector, and the number of sectors per cluster.

So, we have data_sec (which is the partition's starting LBA plus the file system's meta data size (which in turn is number of FAT tables times FAT table size plus number of reserved sectors)), we add to that (cluster - 2) * sectorpercluster, and you'll get the LBA for the sector. Note that SD cards always use 512 bytes per sector (but for example SSDs don't, those use 4096 bytes, which influences the cluster's size of course).

More info can be found in the official documentation. I have a copy of that here. You can find the description of this calculation on page 29.

Hope this helps,
bzt

That explains it a lot better, thank you! I see you did all of that in fat_readfile(), so I'm trying to use that to read the directory for my next attempts.

My next issue then would be that the cluster I'm trying to read must be greater than 0xFFF8 since it doesn't even enter that loop at the bottom of the function. I tried just adding another F to increase the range it would be looking, but that seems like it could take a significant amount of time to finish. Do you have a suggestion to get around this?

Thanks,
manncr

That explains it a lot better, thank you!

You're welcome!

I see you did all of that in fat_readfile()

Yes, because the directory tutorial lists the root directory. For FAT16 the root directory is not stored in the FAT table, and it does not have a cluster number either. For FAT32 it could be fragmented, but very unlikely. All the formatting tools I have ever seen allocates the entire root directory at once, meaning it's always going to be continuous, so no need to consult the FAT table.

My next issue then would be that the cluster I'm trying to read must be greater than 0xFFF8

My example code prints out the cluster number, do that in your code too, that will show you if you've correctly read in the directory entry or not. You can't use 0xFFF8 (and above), those are reserved entries according to the spec (see page 19, section 4.2 Reserved FAT entries).

I tried just adding another F to increase the range it would be looking, but that seems like it could take a significant amount of time to finish.

Yep, it probably read out of bounds, FAT can't store that many entries.

If you really want to use that large cluster numbers, then you must use FAT32 (where the reserved entries are 0x0FFFFFF8 and above. Note: FAT32 is actually FAT28, because you can't use the upper 4 bits for cluster numbers).

Cheers,
bzt

So to support both FAT16 and FAT32 too properly, do

- while(cluster>1 && cluster<0xFFF8) {
+ while(cluster>1 && cluster<(bpb->spf16?0xFFF8:0x0FFFFFF8)) {

Cheers,
bzt

Yeah I'm using FAT32 so that will help a lot, thanks! So something like this should get me the first entry in the sub-directory yes?

`
char name[11];
fat_shortname(filename, name);

//Check entire directory
while(dir->name[0] !=0){
	//If name match found
	if(!memcmp(dir->name, name, 11)){
		//Return the cluster
		printf("==Dir found!: %s \n", dir->name);
		dir = (fat_readfile(((unsigned int)dir->ch)<<16|dir->cl));
		return dir;
	}
	dir++;
}
return 0;

`

I ask because I don't seem to be getting the right information when I try to print out the name of the data entries in the subdirectory I'm trying to access. I know I'm providing the cluster from the root directory data entry for the sub-directory, already tested for that. But obviously I'm doing something wrong.

EDIT: For more specific information, the sub-directory is at cluster 9, but I usually end up getting stuck in that loop at the bottom of fat_readfile()

So something like this should get me the first entry in the sub-directory yes?

Nope, your code would return an arbitrary entry (which has a matching name), not the first one. A few notes: sub-directories always start with two special entries, '. ' (current directory) and '.. ' (parent directory), so you should get the 3rd entry in best case. These two are missing in the root directory btw.

You must exclude other special entries too. If the name starts with a 0xE5, then that's a deleted file, you must not return that. It attribute is 0xF, then that's a special LFN entry, which won't work (it has UC16 name bytes at the same position where normal entries store the cluster number).

Also, you must convert the name to 8+3 format to make memcmp work. That is, uppercase and padded with spaces. For example readme.txt becomes README TXT, and 12345678.c becomes 12345678C .

However it is possible that the FAT you've created does not have a 8+3 record, just an LFN record (the specification demands a normal 8+3 record after the possibly multiple LFN entries, but creating these was eventually patented by MS, so many FOSS formatting tools (as well as the Linux kernel) decided not to save both to avoid parent issues, they are either just saving 8+3 or just an LFN with a gibberish 8+3 name.)

FYI, if you want to handle LFN too, check out this code. It is not as nicely written as the tutorials, but hopefully you can learn from it. To handle both LFN records and normal directory entries, it does not use a struct, just a plain simple unsigned char buffer as input. It returns the cluster number in clu, and file size in size. It requires parent cluster so that it can follow the FAT, just a convenient feature, not really needed by directory lookups. This function checks for LFN names (line 88+) as well as for normal 8+3 entries (line 79-87) and converts both to UTF-8 transparently.

But obviously I'm doing something wrong.

You should print out the memory contents before you call your directory entry lookup code to see if you have indeed loaded the correct data from the storage. If you have some garbage for whatever reason, then your code will return invalid cluster number for sure. If the memory is correct (contains the valid entries for your sub-directory), then your code should work.

Also, it worth printing out the LBA used to load the contents for that sub-directory, and check your fs image if it really contains the valid data at that LBA * 512. (I would use dd with seek to get the data, and then hexdump -C to see what's in it). If this isn't right, then the problem could be in your cluster to LBA calculation too. If this is okay, then you must have valid directory contents in memory (the previous check).

Hope this helps,
bzt

Nope, your code would return an arbitrary entry (which has a matching name)

If I'm handling the short filenames correctly, wouldn't the first one with a matching name be the correct entry?

FYI, if you want to handle LFN too, check out this code

I thought about handling LFN but wanted to get the rest of this working first, so I'll be using the short filenames in the meantime. Although that code will definitely help out when I get to that point, thanks!

You should print out the memory content

image

Is this what you were thinking of, like to check the information in the directory entry, or where exactly should I be looking in memory?

Also, it worth printing out the LBA used to load the contents for that sub-directory,

It seems to be that bpb->spc is at zero, so that could be why I'm infinitely stuck in the loop. Why might this be the case?
image

Thanks again,
manncr

If I'm handling the short filenames correctly, wouldn't the first one with a matching name be the correct entry?

My bad, you meant first that's matching. Yes, correct.

Is this what you were thinking of

Sort of. I usually prefer simple dumps generated directly from bare metal memory, because omitting complexity means always being accurate. Something like with the x command in my debugger tutorial.

> x
0007FFF0: 13 60 09 00  00 00 00 00  24 10 20 3F  00 00 00 00  .`......$. ?....

(I know, it's not as fancy as your's, but it shows all bytes in memory at offset 0x7FFF0 properly.)

You can find the minimalist code to generate this to the serial here, dumps from address os (offset start) to oe (offset end). I've used printf here, but could work with the uart_hex() function too easily (or just use the printf implementation I've provided).

I don't know what kind of software you've used for these dumps, but I bet it wasn't running on the bare metal, which means there's always a chance of error. And as we can see here, there is.

For example, it is suspicious that it has way too many "0x4157xx" values all around the place. I'm suspecting that's a bug in this dumper. Most notably both the jmp (3 chars) and oem (8 chars) are represented by a single dword value, both being 0x4157xx, which simply can't be right, you can't see the actual bytes in those fields (one should be exactly 3 bytes, the other 8). Same goes for fst and fst2 fields (which should contain a magic to identify as FAT file system bpb).

like to check the information in the directory entry, or where exactly should I be looking in memory?

Yes, that's what I meant, dump the memory address that you pass to the directory parser. But dumping bpb is a very good start too! Especially because it's not looking right.

It seems to be that bpb->spc is at zero, so that could be why I'm infinitely stuck in the loop.

Yes, and not just that, there are several other issues. For example, you've said it is a FAT32, but spf16 isn't zero, meaning my code will incorrectly parse it as a FAT16. Number of FAT copies should be 1 or 2, 132 is definitely wrong and definitely results in incorrect cluster to LBA calculation.

Now the question is: is this an issue with the dumper? Have you loaded the bpb correctly from the storage? Have you by accident overwritten the bpb in memory by any chance (that could also explain your issues)? If you have the correct value in memory, then have you calculated the LBA correctly (with these bpb values doesn't seem likely)? If so, then have you loaded the cluster with the directory entries correctly (and to the correct memory address)? That last was my first question, but I think it would be better for you to check everything step by step from the start, because your bpb in memory doesn't look okay.

Cheers,
bzt

So my setup uses a FT232H JTAG board and openocd to host a gdb server, then using Eclipse run it off that server. So that's where I'm getting my info dumps from. I know it's a complicated setup, it's just the one that was given to me. So in theory, it's all running off the RPi since I have a serial cable getting the output off of it via uart.

For example, it is suspicious that it has way too many "0x4157xx" values all around the place

I believe these refer to the address the info is being kept at. If I open the drop down for them, you can see that they are the correct sizes and such:
image

...which as you can see exposes more issues. Such as that it's likely that I'm somehow overwriting bpb like you suggested. I'm not sure how since I haven't touched it in any of my own code. That's for me to look into though, gives me a good place to start. (I also didn't know I could dump variables like this in Eclipse, oops)

Thanks again for your help!