General TC > General TC Talk

grep.tcz vs busybox grep speed

<< < (3/5) > >>

jazzbiker:

--- Quote from: Leee on October 18, 2023, 06:23:02 AM ---
--- Code: ---grep -nE "^${X}.$" ${INPUTFILE} |tail -1
--- End code ---

--- End quote ---
Very simple pattern. Moreover it is anchored both to the beginning and end.I took the brief glance on busybox grep source and in my opinion they implement straightforward use of standard libc regex toolset. So the difference against separate grep utility is probably achieved with grep dedicated optimizations. In other words it is not busybox grep slow, but GNU grep is fast :-) I guess Lua is faster.

patrikg:
Hello @Leee

I know you can sometimes use awk instead of grep, have you tested awk performance in busybox ?

Happy hacking.

Rich:
Hi Leee
I've found  awk  can be significantly faster than  grep.

Something like this should work:

--- Code: ---awk 'BEGIN {RS="\n"} /'"^$X\.$"'/{ print $0 }' "$INPUTFILE" | tail -n 1
--- End code ---

GNU awk (gawk.tcz) should be faster than the busybox awk.

jazzbiker:

--- Quote from: Rich on October 21, 2023, 12:24:45 PM ---I've found  awk  can be significantly faster than  grep.

--- End quote ---

busybox awk uses the same regex functions as busybox grep do, so the only bottleneck left is input file reading ...

Rich:
Hi jazzbiker
Actually, I should have said awk is faster in more complex operations:

--- Quote from: Rich on March 22, 2023, 10:24:08 PM ---Hi GNUser

--- Quote from: GNUser on March 22, 2023, 08:18:15 PM --- ... Given how quickly GNU awk is able to sort provides.db, I'd say this problem is more than solved. The problem is crushed.
--- End quote ---
There's a reason roberts liked to inject awk snippets into his scripts. When it
comes to data manipulation, it can be wicked fast.

I've had a few instances were I found the execution time of a script unacceptable
and was forced to add an awk function. None of my techniques could even touch
the speed of awk.
--- End quote ---

Since this appears to be a fairly simple search, I decided to run
some benchmarks. The backslashes in the search term are to
escape the forward slashes so awk does not throw an error. The
search term is a few entries before the end of the provides file.

--- Code: ---tc@E310:~/onboot$ export X="usr\/local\/bin\/zvbi-atsc-cc"
tc@E310:~/onboot$ time busybox grep "^$X$" ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc
real    0m 0.53s
user    0m 0.44s
sys     0m 0.02s
tc@E310:~/onboot$ time busybox awk 'BEGIN {RS="\n"} /'"^$X$"'/{ print $0 }' ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc
real    0m 0.63s
user    0m 0.48s
sys     0m 0.12s
tc@E310:~/onboot$ time grep "^$X$" ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc
real    0m 0.17s
user    0m 0.05s
sys     0m 0.03s
tc@E310:~/onboot$ time awk 'BEGIN {RS="\n"} /'"^$X$"'/{ print $0 }' ../Scripting/LddCheck/provides-10.x-x86.db
usr/local/bin/zvbi-atsc-cc
real    0m 0.53s
user    0m 0.35s
sys     0m 0.10s
tc@E310:~/onboot$
--- End code ---
So it appears for a simple search like this grep is faster.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version