General TC > Programming & Scripting - Unofficial

[Solved]: how to sort provides.db?

(1/3) > >>

GNUser:
Hello, my smart friends and Rich the wizard.

I need help solving a shell scripting problem I've been mulling over:

If I append a custom package name (and list of its contents) to the bottom of provides.db, how to then sort provides.db so that what I appended is moved to the correct place in the file? (Using awk-speak, how to sort provides.db by the first field--namely, by extension name--where each record is an extension name + its contents?)

I can think of various kludges (e.g., concatenating each extension name and its contents onto a single line, then sorting the lines by first column) but not an efficient and elegant (i.e., one-liner) solution.

Rich:
Hi GNUser
I think  awk  is probably the right tool for this.
This may contain some clues:
https://stackoverflow.com/questions/17048188/how-to-use-awk-sort-by-column-3
They talk about  lines (or rows)  and  columns. That's the same as  records  and  fields
just laid out differently and with different delimiters.

I think you want to sort on field 1 ($1) and print records ($0). I think some versions
of awk have  asorti  which does a lot of the work for you:
https://opensource.com/article/19/11/how-sort-awk

GNUser:
Hi Rich. As usual, you're the man with the answers. The asorti function (available in GNU awk) was the key.


--- Code: ---$ tce-load -wi gawk
...

$ cat sort.awk
#!/usr/local/bin/gawk -f
# usage: ./sort.awk -v var=FIELD FILE

BEGIN {
FS="\n";
RS="";
IGNORECASE=1;
}

{ # dump each field into an array
ARRAY[$var] = $R;
}

END {
asorti(ARRAY,SARRAY);
# get length
j = length(SARRAY);
   
for (i = 1; i <= j; i++) {
printf("%s\n\n", ARRAY[SARRAY[i]])
}
}

$ time ./sort.awk -v var=1 ./provides.db >provides.db-sorted
real 0m 0.17s
user 0m 0.12s
sys 0m 0.06s

--- End code ---

Given how quickly GNU awk is able to sort provides.db, I'd say this problem is more than solved. The problem is crushed.

CNK:
Ahh rats, you got there before me, and the Awk solution is much faster that the 1-2s that mine takes on my system.

Also I forgot about the appended entries and assumed the task was adding a separate extension.tcz.list to the standard provides.db.

But I'll at least claim bonus points for using only Busybox tools.


--- Code: ---#!/bin/ash
# Usage: insert_provides.sh [extension]
# Creates new_provides.db

extension=$1
provides=/etc/sysconfig/tcedir/provides.db
headings="`grep -n '^[^/]*\.tcz$' $provides`"
line="`echo -e \"$headings\n:$extension\" | sort -t : -k 2 -f | grep -A 1 :$extension | tail -n 1 | cut -d : -f 1`"
sed -e ${line}i\\"$extension\\n`sed 's/$/\\\/g' $extension.list`"$'\n' $provides > new_provides.db

--- End code ---

Oh, the forum's actually going to let me post a script with all those slashes in it though, so that's a win!

Rich:
Hi GNUser

--- Quote from: GNUser on March 22, 2023, 08:18:15 PM --- ... Given how quickly GNU awk is able to sort provides.db, I'd say this problem is more than solved. The problem is crushed.
--- End quote ---
There's a reason roberts liked to inject awk snippets into his scripts. When it
comes to data manipulation, it can be wicked fast.

I've had a few instances were I found the execution time of a script unacceptable
and was forced to add an awk function. None of my techniques could even touch
the speed of awk.

Navigation

[0] Message Index

[#] Next page

Go to full version