We haven't done any benchmarks, but any speed difference from the loading of couple hundred kb once is likely very small. Any noticeable difference comes from targeting small size and not the fastest algorithms, such as grep vs gnu grep.
There are some speed gains due to the consolidation, as well as from having some applets executable without needing to fork/exec (like shell builtins, in principle).
Then you'd have to consider that due to TC's design it's already completely in RAM.
There's no additional copy involved, as tmpfs is cache already.