General TC > General TC Talk
grep.tcz vs busybox grep speed
jazzbiker:
--- Quote from: Rich on October 21, 2023, 02:25:27 PM ---Since this appears to be a fairly simple search, I decided to run
some benchmarks.
--- End quote ---
Hi Rich, just for more detailed picture may I ask You to provide the tests You've already run with the pattern closer to Leee's one? Upon the same environment using wildcards and anchors, something like
--- Code: ---tc@E310:~/onboot$ export X="usr\/local\/bin\/zvbi-atsc-c."
tc@E310:~/onboot$ export X="^usr\/local\/bin\/zvbi-atsc-c."
tc@E310:~/onboot$ export X="^usr\/local\/bin\/zvbi-atsc-c.$"
--- End code ---
Please
Rich:
Hi jazzbiker
I take it he's searching for a literal period at the end of the line.
I added a period to that entry in the provides file and repeated
the search:
--- Code: ---tc@E310:~/onboot$ export X="usr\/local\/bin\/zvbi-atsc-cc"
tc@E310:~/onboot$ time busybox grep "^$X.$" ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real 0m 0.49s
user 0m 0.38s
sys 0m 0.03s
tc@E310:~/onboot$ time busybox awk 'BEGIN {RS="\n"} /'"^$X\.$"'/{ print $0 }' ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real 0m 0.62s
user 0m 0.49s
sys 0m 0.09s
tc@E310:~/onboot$ time grep "^$X.$" ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real 0m 0.21s
user 0m 0.03s
sys 0m 0.03s
tc@E310:~/onboot$ time awk 'BEGIN {RS="\n"} /'"^$X\.$"'/{ print $0 }' ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real 0m 0.52s
user 0m 0.35s
sys 0m 0.08s
tc@E310:~/onboot$
--- End code ---
jazzbiker:
Hi Rich,
Thanks for this test, as we can see everything the same. Sorry, I was not attentive enough and missed that You've already added anchors while using the pattern.
--- Quote from: Rich on October 21, 2023, 03:25:06 PM ---I take it he's searching for a literal period at the end of the line.
--- End quote ---
Probably the period meant "any symbol", but anyway benchmark results are the same. thank You very much!
Rich:
Hi jazzbiker
I see. I'm used to using a question mark to match a single character:
--- Code: ---tc@E310:~$ ls -l J?.jpg
-rw-r--r-- 1 tc staff 34836 Feb 2 2022 J1.jpg
-rw-r--r-- 1 tc staff 52988 Feb 2 2022 J2.jpg
-rw-r--r-- 1 tc staff 66236 Feb 2 2022 J3.jpg
tc@E310:~$
--- End code ---
But grep seems to need a period:
--- Code: ---tc@E310:~$ ls -l | grep "J.\.jpg"
-rw-r--r-- 1 tc staff 34836 Feb 2 2022 J1.jpg
-rw-r--r-- 1 tc staff 52988 Feb 2 2022 J2.jpg
-rw-r--r-- 1 tc staff 66236 Feb 2 2022 J3.jpg
tc@E310:~$
--- End code ---
jazzbiker:
Hi Rich,
Those "?" and "*" You mean are not regex but globbing symbols. As far as I understand the strings in the command line including such symbols are expanded by the shell interpreter according to the files existence (omg). If You'd be lucky to :-)
Here is the program to show its arguments:
--- Code: ---tc@box:/tmp/glob$ cat ggg.c
#include <stdio.h>
int main(int argc, char *argv[]) {
while (--argc) printf("%s\n", argv[argc]);
return 0;
}
--- End code ---
Let compile it:
--- Code: ---$ tcc -o ggg ggg.c
--- End code ---
and add some spam:
--- Code: ---$ touch ggh ggk
--- End code ---
Then:
--- Code: ---tc@box:/tmp/glob$ ./g*
./ggk
./ggh
./ggg.c
--- End code ---
And:
--- Code: ---tc@box:/tmp/glob$ ./ggg ./g*
./ggk
./ggh
./ggg.c
./ggg
--- End code ---
But:
--- Code: ---tc@box:/tmp/glob$ ./ggg "./g*"
./g*
--- End code ---
:-)
As far as I know regexes are used by ed, grep, sed, find, vi (maybe some others?). Regexes may be basic, extended, perl-style, lua-style, and maybe some others.
Have a nice regex!
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version