General TC > General TC Talk

grep.tcz vs busybox grep speed

<< < (4/5) > >>

jazzbiker:

--- Quote from: Rich on October 21, 2023, 02:25:27 PM ---Since this appears to be a fairly simple search, I decided to run
some benchmarks.

--- End quote ---

Hi Rich, just for more detailed picture may I ask You to provide the tests You've already run with the pattern closer to Leee's one? Upon the same environment using wildcards and anchors, something like


--- Code: ---tc@E310:~/onboot$ export X="usr\/local\/bin\/zvbi-atsc-c."
tc@E310:~/onboot$ export X="^usr\/local\/bin\/zvbi-atsc-c."
tc@E310:~/onboot$ export X="^usr\/local\/bin\/zvbi-atsc-c.$"

--- End code ---

Please

Rich:
Hi jazzbiker
I take it he's searching for a literal period at the end of the line.
I added a period to that entry in the provides file and repeated
the search:

--- Code: ---tc@E310:~/onboot$ export X="usr\/local\/bin\/zvbi-atsc-cc"
tc@E310:~/onboot$ time busybox grep "^$X.$" ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real    0m 0.49s
user    0m 0.38s
sys     0m 0.03s
tc@E310:~/onboot$ time busybox awk 'BEGIN {RS="\n"} /'"^$X\.$"'/{ print $0 }' ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real    0m 0.62s
user    0m 0.49s
sys     0m 0.09s
tc@E310:~/onboot$ time grep "^$X.$" ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real    0m 0.21s
user    0m 0.03s
sys     0m 0.03s
tc@E310:~/onboot$ time awk 'BEGIN {RS="\n"} /'"^$X\.$"'/{ print $0 }' ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real    0m 0.52s
user    0m 0.35s
sys     0m 0.08s
tc@E310:~/onboot$
--- End code ---

jazzbiker:
Hi Rich,

Thanks for this test, as we can see everything the same. Sorry, I was not attentive enough and missed that You've already added anchors while using the pattern.
 
--- Quote from: Rich on October 21, 2023, 03:25:06 PM ---I take it he's searching for a literal period at the end of the line.

--- End quote ---
Probably the period meant "any symbol", but anyway benchmark results are the same. thank You very much!

Rich:
Hi jazzbiker
I see. I'm used to using a question mark to match a single character:

--- Code: ---tc@E310:~$ ls -l J?.jpg
-rw-r--r-- 1 tc staff 34836 Feb  2  2022 J1.jpg
-rw-r--r-- 1 tc staff 52988 Feb  2  2022 J2.jpg
-rw-r--r-- 1 tc staff 66236 Feb  2  2022 J3.jpg
tc@E310:~$
--- End code ---

But grep seems to need a period:

--- Code: ---tc@E310:~$ ls -l | grep "J.\.jpg"
-rw-r--r--  1 tc staff   34836 Feb  2  2022 J1.jpg
-rw-r--r--  1 tc staff   52988 Feb  2  2022 J2.jpg
-rw-r--r--  1 tc staff   66236 Feb  2  2022 J3.jpg
tc@E310:~$
--- End code ---

jazzbiker:
Hi Rich,

Those "?" and "*" You mean are not regex but globbing symbols. As far as I understand the strings in the command line including such symbols are expanded by the shell interpreter according to the files existence (omg). If You'd be lucky to :-)

Here is the program to show its arguments:

--- Code: ---tc@box:/tmp/glob$ cat ggg.c
#include <stdio.h>
int main(int argc, char *argv[]) {
    while (--argc) printf("%s\n", argv[argc]);
    return 0;
}

--- End code ---

Let compile it:

--- Code: ---$ tcc -o ggg ggg.c
--- End code ---
and add some spam:

--- Code: ---$ touch ggh ggk
--- End code ---

Then:

--- Code: ---tc@box:/tmp/glob$ ./g*
./ggk
./ggh
./ggg.c

--- End code ---

And:

--- Code: ---tc@box:/tmp/glob$ ./ggg ./g*
./ggk
./ggh
./ggg.c
./ggg

--- End code ---

But:

--- Code: ---tc@box:/tmp/glob$ ./ggg "./g*"
./g*

--- End code ---

:-)

As far as I know regexes are used by ed, grep, sed, find, vi (maybe some others?). Regexes may be basic, extended, perl-style, lua-style, and maybe some others.

Have a nice regex!

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version