General TC > Programming & Scripting - Unofficial

[Solved]: how to sort provides.db?

<< < (2/3) > >>

Rich:
Hi CNK

--- Quote from: CNK on March 22, 2023, 09:28:54 PM --- ... But I'll at least claim bonus points for using only Busybox tools. ...
--- End quote ---
Sorry, busybox tools includes awk:

--- Code: ---tc@E310:~$ busybox
BusyBox v1.29.3 (2018-12-19 15:29:37 UTC) multi-call binary.
BusyBox is copyrighted by many authors between 1998-2015.
Licensed under GPLv2. See source distribution for detailed
copyright notices.

 ----- Snip -----

Currently defined functions:
        [, [[, addgroup, adduser, adjtimex, ar, arp, arping, ash, awk,
 ----- Snip -----
--- End code ---

Bonus points denied.  ::)

CNK:

--- Quote from: Rich on March 22, 2023, 10:39:17 PM ---Hi CNK

--- Quote from: CNK on March 22, 2023, 09:28:54 PM --- ... But I'll at least claim bonus points for using only Busybox tools. ...
--- End quote ---
Sorry, busybox tools includes awk:

--- Code: ---tc@E310:~$ busybox
BusyBox v1.29.3 (2018-12-19 15:29:37 UTC) multi-call binary.
BusyBox is copyrighted by many authors between 1998-2015.
Licensed under GPLv2. See source distribution for detailed
copyright notices.

 ----- Snip -----

Currently defined functions:
        [, [[, addgroup, adduser, adjtimex, ar, arp, arping, ash, awk,
 ----- Snip -----
--- End code ---

Bonus points denied.  ::)

--- End quote ---

Ah but I made sure to check that mine was compatible with the Busybox versions of all the tools, whereas:

--- Quote ---The asorti function (available in GNU awk) was the key.

--- End quote ---

Bonus points reinstated?

GNUser:
I like both solutions.

Rich's helped me solved a problem I've found daunting for a long time. Very satisfying, especially considering how fast awk dispatches the job.

I like your script, CNK, especially these three ideas:
1. use only busybox tools
2. put the .list file contents in the correct place to begin with (rather than append then sort)
3. grep -A 1

CNK, I want to add two ideas:
1. If you have a local mirror and have already added new extension to info.lst and sorted info.lst, there is no need for this:
headings="`grep -n '^[^/]*\.tcz$' $provides`"
2. The r command to sed I think would work nicely here.

Here is my variation on your idea:


--- Code: ---#!/bin/sh
# Usage: insert_provides.sh [extension]
# Creates new_provides.db
# Example usage: insert_provides.sh foo.tcz
# This script assumes info.lst already contains foo.tcz in the correct place

extension=$1
info=/path/to/info.lst
provides=/path/to/provides.db

next_tcz=$(grep -A 1 "$1" $info | tail -1)
next_tcz_line=$(grep -n "^$next_tcz$" $provides | cut -d: -f1)
target_line=$(( next_tcz_line - 1 ))

echo "$1" >/tmp/insertion.txt
cat $1.tcz.list >>/tmp/insertion.txt
echo "" >>/tmp/insertion.txt

sed "$target_line r /tmp/insertion.txt" $provides >new_provides.db

--- End code ---

Rich:


Hi CNK
While busybox awk does not natively include a sort function, it can do
the heavy lifting of converting record and field separators to a format
that sort can handle and then back to their original values after sorting
has completed:

--- Code: ---#!/bin/busybox ash

. /etc/init.d/tc-functions
useBusybox

PATH="/bin:/sbin:/usr/bin:/usr/sbin"
export PATH

# busybox sort
awk 'BEGIN {FS="\n";RS=""} {OFS=";"; $1=$1; print $0}' provides-10.x-x86.db | sort -fd | awk 'BEGIN {FS=";";RS="\n"} {OFS="\n"; ORS="\n\n"; $1=$1; print $0}' > tmp.txt

# coreutils sort
#awk 'BEGIN {FS="\n";RS=""} {OFS=";"; $1=$1; print $0}' provides-10.x-x86.db | /usr/local/bin/sort -fd | awk 'BEGIN {FS=";";RS="\n"} {OFS="\n"; ORS="\n\n"; $1=$1; print $0}' > tmp.txt

# awk without sort
#awk 'BEGIN {FS="\n";RS=""} {OFS=";"; $1=$1; print $0}' provides-10.x-x86.db | awk 'BEGIN {FS=";";RS="\n"} {OFS="\n"; ORS="\n\n"; $1=$1; print $0}' > tmp.txt
--- End code ---

This is the timing using busybox sort:

--- Code: ---tc@E310:~/awksort$ time ./awksort.sh
real    0m 4.87s
user    0m 4.52s
sys     0m 0.56s
--- End code ---

This is the timing using coreutils sort:

--- Code: ---tc@E310:~/awksort$ time ./awksort.sh
real    0m 1.44s
user    0m 0.97s
sys     0m 0.52s
--- End code ---

This is the timing without sort in the pipeline:

--- Code: ---tc@E310:~/awksort$ time ./awksort.sh
real    0m 1.06s
user    0m 1.16s
sys     0m 0.35s
--- End code ---

Running a  diff  on the 2 files shows 2 small differences:
libev-dev.tcz came before libevdev.tcz in the original but sort swapped them.
gst-ffmpeg.tcz  had 2 blank lines preceding it in the original instead of 1.


--- Quote from: CNK on March 22, 2023, 10:52:36 PM --- ... Bonus points reinstated?
--- End quote ---
Nope, reallocated to myself.  :P

Rich:
Hi CNK
Here are a couple of quicksort examples for busybox awk:

--- Code: ---# http://www.victor-notes.com/qppro/awk.html

BEGIN { RS = ""; FS = "\n" }

      { A[NR] = $0 }

END   {
qsort(A, 1, NR)
for (i = 1; i <= NR; i++) {
print A[i]
if (i == NR) break
print ""
}
      }

# QuickSort
# Source: "The AWK Programming Language", by Aho, et.al., p.161
function qsort(A, left, right,   i, last) {
if (left >= right)
return
swap(A, left, left+int((right-left+1)*rand()))
last = left
for (i = left+1; i <= right; i++)
if (A[i] < A[left])
swap(A, ++last, i)
swap(A, left, last)
qsort(A, left, last-1)
qsort(A, last+1, right)
}
function swap(A, i, j,   t) {
t = A[i]; A[i] = A[j]; A[j] = t
}
--- End code ---

Timing:

--- Code: ---tc@E310:~/awksort$ time /usr/bin/awk -f quicksort.sh provides-10.x-x86.db > sorted.txt
real    0m 1.74s
user    0m 1.22s
sys     0m 0.52s
--- End code ---

And a faster version:

--- Code: ---# http://www.victor-notes.com/qppro/awk.html

BEGIN { RS = ""; FS = "\n" }

      { A[NR] = $0 }

END   {
qsort(A, 1, NR)
for (i = 1; i <= NR; i++) {
print A[i]
if (i == NR) break
print ""
}
      }


# https://community.hpe.com/t5/languages-and-scripting/sort-in-awk-is-very-slow/m-p/2720834#M2904
# qsort expects 3 arguments
# 1 - the array to be sorted
# 2 - the array lower bound (1)
# 3 - the number of elements in the array
# Note : The other arguments are local variables
#        and are not passed as parameters       

function qsort(arry,lo,hi,           i,j,tmp_x,tmp_x2)
{
  i = lo + 0
  j = hi + 0
  tmp_x = arry[int(int(lo + hi) / 2)]
  do
    {
      while (arry[i] < tmp_x) ++i
      while (tmp_x < arry[j]) --j
      if (i <= j)
        {
          tmp_x2 = arry[i]
          arry[i] = arry[j]
          arry[j] = tmp_x2
          ++i
          --j
        }
    }
  while (i <= j)
  if (lo < j) qsort(arry,lo,j)
  if (i < hi) qsort(arry,i,hi)
  return(0)
} # qsort
--- End code ---

And timing:

--- Code: ---tc@E310:~/awksort$ time /usr/bin/awk -f quicksort2.sh provides-10.x-x86.db > sorted2.txt
real    0m 1.03s
user    0m 0.76s
sys     0m 0.27s
--- End code ---

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version