Embedded as in "32-bit ARM running Linux", or "8-bit microcontroller"? Either wa...

JohnBooty · on Sept 10, 2015

   I believe that with good tuning and allocation 
   heuristics, FAT32 can outperform the far more complex
   filesystems (ext*, NTFS, etc.) widely believed to be
   superior. One idea I've never gotten around to testing
   out is to modify the Linux FAT driver and do some
   benchmarking.

I don't understand. Has anybody ever claimed that the more complex alternatives were actually faster?

NTFS and other filesystems that followed FAT32 are "superior" because they support things like journaling and more robust permissions... things that unavoidably incur (a least) a small performance hit.

ambrop7 · on Sept 10, 2015

It is targeted to ARM uC's and also works on AVRs with enough RAM and program space (yes I've tested it). Write support requires a significant amount of RAM; I think about six 512-byte blocks. When compiled in read-only mode, it should need no more than two.

One reason for the RAM needs is that I use a block cache and I take care do to "atomically" certain related operations (on the block cache level). These are things like marking a cluster as allocated, decrementing the number of free clusters in FsInfo, and linking the allocated cluster to the end of a chain. These things require multiple blocks to all be present in the block cache at the same time.

I suppose the asynchronous design also contributes to the code and RAM size (though I did not actually measure the code size or compare it to other drivers). It may look like more that it really compiles to. Anyway would be very surprised if you show me another asynchronous driver - I couldn't find any!

Also consider that my code supports VFATs (decoding only, file creation is not supported anyway).

The design is intentional, I have sacrificed some RAM/program size for asynchronous operation and better reliability in case of random write failures (because of the block cache, all writes can reliably be retried later, avoiding corruption as long as they eventually complete).

Update: I estimate the code compiles to about 10kB with gcc for Coretex-M3 using -Os. This is certainly within limits of most ARM uCs and many AVRs.