Tiny Core Base > TCB Tips & Tricks
How to find which extension provides a file
gadget42:
this forum is awesome...and at the same time humbling(much to learn grasshopper...sigh).
Rich:
Hi Leee
--- Quote from: Leee on November 14, 2024, 01:58:48 PM --- ... I tried coding the exact search, using other tools, to search for lines ending in /${TARGET} and it worked perfectly... but took four minutes to run. ...
--- End quote ---
I once tried that too, but nothing came close to touching the speed of awk.
It's because awk is searching the entire record for a match, not individual
fields. A record looks like this:
--- Code: ---Field1
field2
field3
--- End code ---
Field1 is the .tcz name. Field2 and Field3 are filenames. The blank line at
the end marks the end of the record. In awk land, {print $1} means print
the contents of Field1. $2 would be Field2, and so on.
Not shown is $0 which represents the entire record and looks like this:
--- Code: ---Field1\nField2\nField3
--- End code ---
The record is treated as one long string with embedded \n record
separators, which is what gets searched. Field3 has no field separator
because it's the end of the record. If I change {print $1} to {printf $0}
the whole record gets printed and printf doesn't add any \n characters:
--- Code: ---tc@E310:~/Scripting/Provides$ ./prov.sh "bin\/egrep\n"
grep.tcz
/usr/local/bin/egrep
/usr/local/bin/fgrep
/usr/local/bin/greptc@E310:~/Scripting/Provides$
--- End code ---
You can see it print Field1\nField2\nField3\n but Field4 is immediately
followed by the command prompt, which explains why search terms
in the last field were not found in reply #5.
It's because awk is scanning an entire record in one go that it's so fast.
If you tell awk to search records one field at a time, it too will run slower.
patrikg:
Maybe "some one"=me right now :) have little time to make a script that unsquashfs -l tcefile.tce list the tce files and make a db of that. It goes very fast with sqlite3 to search.
Here you go, make.sh makes a db and fill the file names and tcz package names in.
And search.sh make you search in the db, you can edit the file to change you needs.
Happy hacking:
--- Code: ---make.sh
cat << 'EOF' > make.sh
#!/bin/bash
sqlite3 files.db "CREATE TABLE tcz_files (name TEXT NOT NULL,file TEXT NOT NULL);"
for filename in $(ls *.tcz)
do
tczfile=$filename;for file in $(unsquashfs -lc $tczfile); do sqlite3 files.db "INSERT INTO tcz_files ( NAME , FILE ) VALUES ( '$tczfile' , '$(basename $file)' );" ; done
done
EOF
chmod u+x make.sh
search.sh
cat << 'EOF' > search.sh
sqlite3 -box files.db "SELECT * FROM tcz_files WHERE file LIKE '%$1%';"
EOF
chmod u+x search.sh
ls
files.db gimp2.tcz make.sh search.sh sqlite3-bin.tcz sqlite3.tcz
./search.sh pdf
┌───────────┬───────────────┐
│ name │ file │
├───────────┼───────────────┤
│ gimp2.tcz │ file-pdf-load │
│ gimp2.tcz │ file-pdf-save │
└───────────┴───────────────┘
./search.sh sql
┌─────────────────┬─────────────────────┐
│ name │ file │
├─────────────────┼─────────────────────┤
│ sqlite3-bin.tcz │ sqlite3 │
│ sqlite3.tcz │ libsqlite3.so │
│ sqlite3.tcz │ libsqlite3.so.0 │
│ sqlite3.tcz │ libsqlite3.so.0.8.6 │
└─────────────────┴─────────────────────┘
sqlite3 -box files.db "SELECT COUNT(*) AS 'Number of filenames' FROM tcz_files;"
┌─────────────────────┐
│ Number of filenames │
├─────────────────────┤
│ 4191 │
└─────────────────────┘
ls -alh files.db
-rw-r--r-- 1 patrik users 164K 14 nov 22.34 files.db
--- End code ---
yvs:
I think that number of files not so big that even grep/sed through text file is negligible.
For example using sed search in shell command_not_found_handler() I got:
--- Code: ---% as
as found in: binutils
% time as
as found in: binutils
0.00s user 0.01s system 97% cpu 0.007 total
% time cpp
cpp found in: gcc
0.00s user 0.00s system 97% cpu 0.008 total
% time users
users found in: coreutils
0.01s user 0.00s system 95% cpu 0.008 total
--- End code ---
i.e. it's about 10msec (time comparable to switching between contexts)
nick65go:
so, maybe is premature, but then a conclusion?
the winner is sqlite (compiled), or awk (compiled), or sh shell/scripts?
OR, if the speed is not the main goal (but nice for multi-CPU core) then maybe the user confort for seach parameters (GUI options will be nice also). But of course, the pro[fessionals] already know the use of "" and back-quotes, etc.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version