Hi centralware
So, I decided to start from scratch and rethink my approach, and it payed off. These are the results:
tc@E310:~/count/armv7TC04TC11$ sudo cache-clear
tc@E310:~/count/armv7TC04TC11$ sync
tc@E310:~/count/armv7TC04TC11$ ./BuildProvidesAllDatabase.sh
Stopwatch reset. Fetching provides.db files.
Elapsed Time=00:00:15 Fetching provides.db files. Done.
Stopwatch reset. Convert records from multi line to single line.
Elapsed Time=00:00:01 Convert records from multi line to single line. Done.
Elapsed Time=00:00:00 Combine provides files and strip out kermel module records.
Elapsed Time=00:00:02 Combine provides files and strip out kermel module records. Done.
Elapsed Time=00:00:00 Do a case insensitive sort (-f).
Elapsed Time=00:00:05 Case insensitive sort (-f). Done.
Elapsed Time=00:00:00 Create combined info.lst from combined provides.db.
Elapsed Time=00:00:00 Create combined info.lst from combined provides.db. Done.
Elapsed Time=00:00:00 Remove duplicate names keeping the most recent TCversion, saving in infoAll.lst.
Elapsed Time=00:00:01 Remove duplicate names keeping the most recent TCversion, saving in infoAll.lst. Done.
Elapsed Time=00:00:00 Find providesSort.db entries matching infoAll.lst, saving results.
Elapsed Time=00:00:01 Find providesSort.db entries matching infoAll.lst, saving results. Done.
Elapsed Time=00:00:00 Convert back to multi line, save in providesAll.db file.
Elapsed Time=00:00:00 Convert back to multi line, save in providesAll.db file. Done.
Elapsed Time=00:00:10 Total time spent manipulating data.
Results were saved in the armv7TC04TC11 directory.
tc@E310:~/count/armv7TC04TC11$
Time to download the provides.db files is listed separately. It varied from 15 to 28 seconds. Nothing I can do about that.
After that, the steps are"
1. Append :TCversion to each Name.tcz entry and change the field separators in the provides.db files from a carriage
return to a semicolon. This way each record in the file occupies only one line. This will simplify sorting later on.
2. Combine the provides.db files while stripping out the kernel module entries from the older TC versions.
3. Do a case insensitive sort on the first field of the combined provides.db file.
4. Create an info.lst that matches the combined provides.db file by copying the first field from each record.
5. Reduce the info.lst by keeping the most recent version of each extension name.
6. Reduce the combined provides.db file using the reduced info.lst file as a template.
7. Change the field separators in the reduced provides.db file from a semicolon back to a carriage return.
The process started with 6901 entries and reduced that to 2141 entries.
Total time for steps 1 through 7 is 10 seconds. All commands in the script are aliased to busybox. At 5 seconds, step 5
takes the most time. Using GNU sort, step 5 execution time drops from 5 seconds to 0.5 seconds.
Results are saved in a work directory formed by $ARCH $MinVersion and $MaxVersion like this armv7TC04TC11.
I decided to run this on a bigger data set, so I went with x86 TC4 through TC11.
Time to download the provides.db files was 47 seconds.
The process started with 20113 entries and reduced that to 5594 entries.
Total time for steps 1 through 7 was 41 seconds. Step 5 took 23 seconds. Using GNU sort would reduce step 5 to 2 seconds.
The BuildProvidesAllDatabase.sh script saves its results in 2 files, infoAll.lst and providesAll.db.
The ProvidesAll.sh script can be use to search providesAll.db which it looks for in the current directory. Example:
tc@E310:~/count/armv7TC04TC11$ ../ProvidesAll.sh "bin\/find"
findutils.tcz:TC11
usr/local/bin/find
libftdi.tcz:TC09
usr/local/bin/find_all
util-linux.tcz:TC11
usr/local/sbin/findfs
usr/local/bin/findmnt
tc@E310:~/count/armv7TC04TC11$