General TC > Programming & Scripting - Unofficial
[Solved]: how to sort provides.db?
Rich:
Hi CNK
--- Quote from: CNK on March 22, 2023, 09:28:54 PM --- ... But I'll at least claim bonus points for using only Busybox tools. ...
--- End quote ---
Sorry, busybox tools includes awk:
--- Code: ---tc@E310:~$ busybox
BusyBox v1.29.3 (2018-12-19 15:29:37 UTC) multi-call binary.
BusyBox is copyrighted by many authors between 1998-2015.
Licensed under GPLv2. See source distribution for detailed
copyright notices.
----- Snip -----
Currently defined functions:
[, [[, addgroup, adduser, adjtimex, ar, arp, arping, ash, awk,
----- Snip -----
--- End code ---
Bonus points denied. ::)
CNK:
--- Quote from: Rich on March 22, 2023, 10:39:17 PM ---Hi CNK
--- Quote from: CNK on March 22, 2023, 09:28:54 PM --- ... But I'll at least claim bonus points for using only Busybox tools. ...
--- End quote ---
Sorry, busybox tools includes awk:
--- Code: ---tc@E310:~$ busybox
BusyBox v1.29.3 (2018-12-19 15:29:37 UTC) multi-call binary.
BusyBox is copyrighted by many authors between 1998-2015.
Licensed under GPLv2. See source distribution for detailed
copyright notices.
----- Snip -----
Currently defined functions:
[, [[, addgroup, adduser, adjtimex, ar, arp, arping, ash, awk,
----- Snip -----
--- End code ---
Bonus points denied. ::)
--- End quote ---
Ah but I made sure to check that mine was compatible with the Busybox versions of all the tools, whereas:
--- Quote ---The asorti function (available in GNU awk) was the key.
--- End quote ---
Bonus points reinstated?
GNUser:
I like both solutions.
Rich's helped me solved a problem I've found daunting for a long time. Very satisfying, especially considering how fast awk dispatches the job.
I like your script, CNK, especially these three ideas:
1. use only busybox tools
2. put the .list file contents in the correct place to begin with (rather than append then sort)
3. grep -A 1
CNK, I want to add two ideas:
1. If you have a local mirror and have already added new extension to info.lst and sorted info.lst, there is no need for this:
headings="`grep -n '^[^/]*\.tcz$' $provides`"
2. The r command to sed I think would work nicely here.
Here is my variation on your idea:
--- Code: ---#!/bin/sh
# Usage: insert_provides.sh [extension]
# Creates new_provides.db
# Example usage: insert_provides.sh foo.tcz
# This script assumes info.lst already contains foo.tcz in the correct place
extension=$1
info=/path/to/info.lst
provides=/path/to/provides.db
next_tcz=$(grep -A 1 "$1" $info | tail -1)
next_tcz_line=$(grep -n "^$next_tcz$" $provides | cut -d: -f1)
target_line=$(( next_tcz_line - 1 ))
echo "$1" >/tmp/insertion.txt
cat $1.tcz.list >>/tmp/insertion.txt
echo "" >>/tmp/insertion.txt
sed "$target_line r /tmp/insertion.txt" $provides >new_provides.db
--- End code ---
Rich:
Hi CNK
While busybox awk does not natively include a sort function, it can do
the heavy lifting of converting record and field separators to a format
that sort can handle and then back to their original values after sorting
has completed:
--- Code: ---#!/bin/busybox ash
. /etc/init.d/tc-functions
useBusybox
PATH="/bin:/sbin:/usr/bin:/usr/sbin"
export PATH
# busybox sort
awk 'BEGIN {FS="\n";RS=""} {OFS=";"; $1=$1; print $0}' provides-10.x-x86.db | sort -fd | awk 'BEGIN {FS=";";RS="\n"} {OFS="\n"; ORS="\n\n"; $1=$1; print $0}' > tmp.txt
# coreutils sort
#awk 'BEGIN {FS="\n";RS=""} {OFS=";"; $1=$1; print $0}' provides-10.x-x86.db | /usr/local/bin/sort -fd | awk 'BEGIN {FS=";";RS="\n"} {OFS="\n"; ORS="\n\n"; $1=$1; print $0}' > tmp.txt
# awk without sort
#awk 'BEGIN {FS="\n";RS=""} {OFS=";"; $1=$1; print $0}' provides-10.x-x86.db | awk 'BEGIN {FS=";";RS="\n"} {OFS="\n"; ORS="\n\n"; $1=$1; print $0}' > tmp.txt
--- End code ---
This is the timing using busybox sort:
--- Code: ---tc@E310:~/awksort$ time ./awksort.sh
real 0m 4.87s
user 0m 4.52s
sys 0m 0.56s
--- End code ---
This is the timing using coreutils sort:
--- Code: ---tc@E310:~/awksort$ time ./awksort.sh
real 0m 1.44s
user 0m 0.97s
sys 0m 0.52s
--- End code ---
This is the timing without sort in the pipeline:
--- Code: ---tc@E310:~/awksort$ time ./awksort.sh
real 0m 1.06s
user 0m 1.16s
sys 0m 0.35s
--- End code ---
Running a diff on the 2 files shows 2 small differences:
libev-dev.tcz came before libevdev.tcz in the original but sort swapped them.
gst-ffmpeg.tcz had 2 blank lines preceding it in the original instead of 1.
--- Quote from: CNK on March 22, 2023, 10:52:36 PM --- ... Bonus points reinstated?
--- End quote ---
Nope, reallocated to myself. :P
Rich:
Hi CNK
Here are a couple of quicksort examples for busybox awk:
--- Code: ---# http://www.victor-notes.com/qppro/awk.html
BEGIN { RS = ""; FS = "\n" }
{ A[NR] = $0 }
END {
qsort(A, 1, NR)
for (i = 1; i <= NR; i++) {
print A[i]
if (i == NR) break
print ""
}
}
# QuickSort
# Source: "The AWK Programming Language", by Aho, et.al., p.161
function qsort(A, left, right, i, last) {
if (left >= right)
return
swap(A, left, left+int((right-left+1)*rand()))
last = left
for (i = left+1; i <= right; i++)
if (A[i] < A[left])
swap(A, ++last, i)
swap(A, left, last)
qsort(A, left, last-1)
qsort(A, last+1, right)
}
function swap(A, i, j, t) {
t = A[i]; A[i] = A[j]; A[j] = t
}
--- End code ---
Timing:
--- Code: ---tc@E310:~/awksort$ time /usr/bin/awk -f quicksort.sh provides-10.x-x86.db > sorted.txt
real 0m 1.74s
user 0m 1.22s
sys 0m 0.52s
--- End code ---
And a faster version:
--- Code: ---# http://www.victor-notes.com/qppro/awk.html
BEGIN { RS = ""; FS = "\n" }
{ A[NR] = $0 }
END {
qsort(A, 1, NR)
for (i = 1; i <= NR; i++) {
print A[i]
if (i == NR) break
print ""
}
}
# https://community.hpe.com/t5/languages-and-scripting/sort-in-awk-is-very-slow/m-p/2720834#M2904
# qsort expects 3 arguments
# 1 - the array to be sorted
# 2 - the array lower bound (1)
# 3 - the number of elements in the array
# Note : The other arguments are local variables
# and are not passed as parameters
function qsort(arry,lo,hi, i,j,tmp_x,tmp_x2)
{
i = lo + 0
j = hi + 0
tmp_x = arry[int(int(lo + hi) / 2)]
do
{
while (arry[i] < tmp_x) ++i
while (tmp_x < arry[j]) --j
if (i <= j)
{
tmp_x2 = arry[i]
arry[i] = arry[j]
arry[j] = tmp_x2
++i
--j
}
}
while (i <= j)
if (lo < j) qsort(arry,lo,j)
if (i < hi) qsort(arry,i,hi)
return(0)
} # qsort
--- End code ---
And timing:
--- Code: ---tc@E310:~/awksort$ time /usr/bin/awk -f quicksort2.sh provides-10.x-x86.db > sorted2.txt
real 0m 1.03s
user 0m 0.76s
sys 0m 0.27s
--- End code ---
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version