WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: grep.tcz vs busybox grep speed  (Read 3535 times)

Offline jazzbiker

  • Hero Member
  • *****
  • Posts: 934
Re: grep.tcz vs busybox grep speed
« Reply #15 on: October 21, 2023, 03:04:09 PM »
Since this appears to be a fairly simple search, I decided to run
some benchmarks.

Hi Rich, just for more detailed picture may I ask You to provide the tests You've already run with the pattern closer to Leee's one? Upon the same environment using wildcards and anchors, something like

Code: [Select]
tc@E310:~/onboot$ export X="usr\/local\/bin\/zvbi-atsc-c."
tc@E310:~/onboot$ export X="^usr\/local\/bin\/zvbi-atsc-c."
tc@E310:~/onboot$ export X="^usr\/local\/bin\/zvbi-atsc-c.$"

Please

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11595
Re: grep.tcz vs busybox grep speed
« Reply #16 on: October 21, 2023, 03:25:06 PM »
Hi jazzbiker
I take it he's searching for a literal period at the end of the line.
I added a period to that entry in the provides file and repeated
the search:
Code: [Select]
tc@E310:~/onboot$ export X="usr\/local\/bin\/zvbi-atsc-cc"
tc@E310:~/onboot$ time busybox grep "^$X.$" ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real    0m 0.49s
user    0m 0.38s
sys     0m 0.03s
tc@E310:~/onboot$ time busybox awk 'BEGIN {RS="\n"} /'"^$X\.$"'/{ print $0 }' ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real    0m 0.62s
user    0m 0.49s
sys     0m 0.09s
tc@E310:~/onboot$ time grep "^$X.$" ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real    0m 0.21s
user    0m 0.03s
sys     0m 0.03s
tc@E310:~/onboot$ time awk 'BEGIN {RS="\n"} /'"^$X\.$"'/{ print $0 }' ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc.
real    0m 0.52s
user    0m 0.35s
sys     0m 0.08s
tc@E310:~/onboot$

Offline jazzbiker

  • Hero Member
  • *****
  • Posts: 934
Re: grep.tcz vs busybox grep speed
« Reply #17 on: October 21, 2023, 03:39:20 PM »
Hi Rich,

Thanks for this test, as we can see everything the same. Sorry, I was not attentive enough and missed that You've already added anchors while using the pattern.
 
I take it he's searching for a literal period at the end of the line.
Probably the period meant "any symbol", but anyway benchmark results are the same. thank You very much!

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11595
Re: grep.tcz vs busybox grep speed
« Reply #18 on: October 21, 2023, 05:31:51 PM »
Hi jazzbiker
I see. I'm used to using a question mark to match a single character:
Code: [Select]
tc@E310:~$ ls -l J?.jpg
-rw-r--r-- 1 tc staff 34836 Feb  2  2022 J1.jpg
-rw-r--r-- 1 tc staff 52988 Feb  2  2022 J2.jpg
-rw-r--r-- 1 tc staff 66236 Feb  2  2022 J3.jpg
tc@E310:~$

But grep seems to need a period:
Code: [Select]
tc@E310:~$ ls -l | grep "J.\.jpg"
-rw-r--r--  1 tc staff   34836 Feb  2  2022 J1.jpg
-rw-r--r--  1 tc staff   52988 Feb  2  2022 J2.jpg
-rw-r--r--  1 tc staff   66236 Feb  2  2022 J3.jpg
tc@E310:~$

Offline jazzbiker

  • Hero Member
  • *****
  • Posts: 934
Re: grep.tcz vs busybox grep speed
« Reply #19 on: October 21, 2023, 06:45:46 PM »
Hi Rich,

Those "?" and "*" You mean are not regex but globbing symbols. As far as I understand the strings in the command line including such symbols are expanded by the shell interpreter according to the files existence (omg). If You'd be lucky to :-)

Here is the program to show its arguments:
Code: [Select]
tc@box:/tmp/glob$ cat ggg.c
#include <stdio.h>
int main(int argc, char *argv[]) {
    while (--argc) printf("%s\n", argv[argc]);
    return 0;
}

Let compile it:
Code: [Select]
$ tcc -o ggg ggg.cand add some spam:
Code: [Select]
$ touch ggh ggk
Then:
Code: [Select]
tc@box:/tmp/glob$ ./g*
./ggk
./ggh
./ggg.c

And:
Code: [Select]
tc@box:/tmp/glob$ ./ggg ./g*
./ggk
./ggh
./ggg.c
./ggg

But:
Code: [Select]
tc@box:/tmp/glob$ ./ggg "./g*"
./g*

:-)

As far as I know regexes are used by ed, grep, sed, find, vi (maybe some others?). Regexes may be basic, extended, perl-style, lua-style, and maybe some others.

Have a nice regex!


Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11595
Re: grep.tcz vs busybox grep speed
« Reply #20 on: October 21, 2023, 09:43:09 PM »
Hi jazzbiker
... Then:
Code: [Select]
tc@box:/tmp/glob$ ./g*
./ggk
./ggh
./ggg.c
...
That's pretty slick. First  ggg  gets expanded invoking the program, then
the remaining 3 filenames get expanded and passed to the program.

Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 712
Re: grep.tcz vs busybox grep speed
« Reply #21 on: October 22, 2023, 06:44:10 AM »
Just a thought with this, the shell globbing feature has some limits, like the command line max char length.

In my shell in my Arch, i can get the max value when executing this command.
This command is new to me..and when using option -a i get all values being set.
Learning every day something new. :)
Don't know if this also contain the env vars.

Code: [Select]
getconf ARG_MAX
2097152
What i can see it's set to 2MB.

Code: [Select]
echo $((1024*1024*2))
2097152

If i google a bit of this i also find that you can get this from xargs iike this.
Code: [Select]
xargs --no-run-if-empty --show-limits </dev/null
Code: [Select]
Size of command buffer we are actually using: 131072 :(:(:(

Happy hacking on your keys.
« Last Edit: October 22, 2023, 06:55:45 AM by patrikg »