@roberto: I think the big delay for "tce-load" script is because it calls an internal functionapp_exists() {
    [ -f "$2/$1" ] && [ -f "$2/$1".md5.txt ] && (cd "$2" && md5sum -cs "$1".md5.txt)
}so md5 check for EACH tcz could be  the main delay. 
What else, logically can be? The main process is to build a full list of tcz (from recursive *.dep) and mount them all sequentially. 
Can you mount them in parallel (in background?), with an algorithm half of half of half (for more than 4 tczs)? 
The order of tcz should not matter if in the final a (possible) /usr/local/tce.installed/$app/xx.sh is run, to cover all asynchronous tcz loaded.