General TC > Programming & Scripting - Unofficial
[Solved]: how to sort provides.db?
GNUser:
Hello, my smart friends and Rich the wizard.
I need help solving a shell scripting problem I've been mulling over:
If I append a custom package name (and list of its contents) to the bottom of provides.db, how to then sort provides.db so that what I appended is moved to the correct place in the file? (Using awk-speak, how to sort provides.db by the first field--namely, by extension name--where each record is an extension name + its contents?)
I can think of various kludges (e.g., concatenating each extension name and its contents onto a single line, then sorting the lines by first column) but not an efficient and elegant (i.e., one-liner) solution.
Rich:
Hi GNUser
I think awk is probably the right tool for this.
This may contain some clues:
https://stackoverflow.com/questions/17048188/how-to-use-awk-sort-by-column-3
They talk about lines (or rows) and columns. That's the same as records and fields
just laid out differently and with different delimiters.
I think you want to sort on field 1 ($1) and print records ($0). I think some versions
of awk have asorti which does a lot of the work for you:
https://opensource.com/article/19/11/how-sort-awk
GNUser:
Hi Rich. As usual, you're the man with the answers. The asorti function (available in GNU awk) was the key.
--- Code: ---$ tce-load -wi gawk
...
$ cat sort.awk
#!/usr/local/bin/gawk -f
# usage: ./sort.awk -v var=FIELD FILE
BEGIN {
FS="\n";
RS="";
IGNORECASE=1;
}
{ # dump each field into an array
ARRAY[$var] = $R;
}
END {
asorti(ARRAY,SARRAY);
# get length
j = length(SARRAY);
for (i = 1; i <= j; i++) {
printf("%s\n\n", ARRAY[SARRAY[i]])
}
}
$ time ./sort.awk -v var=1 ./provides.db >provides.db-sorted
real 0m 0.17s
user 0m 0.12s
sys 0m 0.06s
--- End code ---
Given how quickly GNU awk is able to sort provides.db, I'd say this problem is more than solved. The problem is crushed.
CNK:
Ahh rats, you got there before me, and the Awk solution is much faster that the 1-2s that mine takes on my system.
Also I forgot about the appended entries and assumed the task was adding a separate extension.tcz.list to the standard provides.db.
But I'll at least claim bonus points for using only Busybox tools.
--- Code: ---#!/bin/ash
# Usage: insert_provides.sh [extension]
# Creates new_provides.db
extension=$1
provides=/etc/sysconfig/tcedir/provides.db
headings="`grep -n '^[^/]*\.tcz$' $provides`"
line="`echo -e \"$headings\n:$extension\" | sort -t : -k 2 -f | grep -A 1 :$extension | tail -n 1 | cut -d : -f 1`"
sed -e ${line}i\\"$extension\\n`sed 's/$/\\\/g' $extension.list`"$'\n' $provides > new_provides.db
--- End code ---
Oh, the forum's actually going to let me post a script with all those slashes in it though, so that's a win!
Rich:
Hi GNUser
--- Quote from: GNUser on March 22, 2023, 08:18:15 PM --- ... Given how quickly GNU awk is able to sort provides.db, I'd say this problem is more than solved. The problem is crushed.
--- End quote ---
There's a reason roberts liked to inject awk snippets into his scripts. When it
comes to data manipulation, it can be wicked fast.
I've had a few instances were I found the execution time of a script unacceptable
and was forced to add an awk function. None of my techniques could even touch
the speed of awk.
Navigation
[0] Message Index
[#] Next page
Go to full version