WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: script that generates .tree files  (Read 1200 times)

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1832
Re: script that generates .tree files
« Reply #15 on: April 06, 2026, 09:11:39 PM »
Hi, Rich. That was a good thread! It's what got me interested in awk. "Programming in AWK" has since become my favorite programming book :)

I just never did any actual awk vs. shell benchmarking until now. The result is more dramatic than I expected.

I will see what GNU awk can do.
« Last Edit: April 06, 2026, 09:14:09 PM by GNUser »

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1832
Re: script that generates .tree files
« Reply #16 on: April 07, 2026, 11:29:07 AM »
Hi Rich. I changed the sole  print  statement  in the script to  printf  and was able to achieve a small but real increase in speed.

Also, I discovered that adding a set of parenthesis around the concatenation operation at the  getline  step is required for the script to work with gawk.

So here is the final version of the script:

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v1.2 (April 7, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
LEVEL = 0
LINUX_VERSION = "6.18.2-tinycore64"
MIRROR_PATH = "/path/to/your/local/mirror/tinycorelinux/17.x/x86_64/tcz/"
get_dependencies(ARGV[1])
}

function get_dependencies(app,    depapp) {
gsub(/KERNEL/,LINUX_VERSION,app)
for (i = 0; i < LEVEL; i++)
printf("   ")
printf("%s\n", app)
while (getline depapp <(MIRROR_PATH app ".dep") > 0) {
if (depapp ~ /\.tcz/) { # because some .dep files have blank lines
LEVEL++
get_dependencies(depapp)
}
}
close(MIRROR_PATH app ".dep")
LEVEL--
}

To use GNU awk rather than Busybox awk, one simply needs to load gawk and change the shebang from  #!/usr/bin/awk -f  to  #!/usr/local/bin/gawk -f

I don't have a lot of time to tinker today, so for the Busybox awk vs. GNU awk benchmarking I simply ran this command 10 times for each awk version:

Code: [Select]
$ time treegen.awk vlc-dev.tcz
then took the mean of the real time shown in the output. Here is the somewhat surprising result:

Busybox awk: 1.52 sec
GNU awk: 1.61 sec


I'm not sure whether the difference is statistically significant. I generally prefer to do as much as possible with just what's included in the base system, so I'm happy that Busybox is at least as fast as GNU awk if not slightly faster for this particular task!

P.S. treegen.awk is now in the public domain. It would be trivial to use a variable to count the number of times the get_dependencies function has been called, and bail out if that variable exceeds some appropriately large number. This could be used as a poor man's test for circular dependencies.
« Last Edit: April 07, 2026, 11:41:40 AM by GNUser »

Online Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12712
Re: script that generates .tree files
« Reply #17 on: April 07, 2026, 02:55:29 PM »
Hi GNUser
I reread that thread I referenced and realized I misremembered
what I thought was a faster GNU awk. It was a much faster GNU
 sort  that I mentioned in that thread.

I think the minor speed difference between awk and gawk is just noise.

... count the number of times the get_dependencies function has been called, and bail out if that variable exceeds some appropriately large number. This could be used as a poor man's test for circular dependencies.
A better way is to build a flattened version as you go:
Code: [Select]
A.tcz,B.tcz,C.tcz,D.tcz,E.tczBefore appending the next tcz to the flattened list, check if it exists.
In this example, B.tcz creates a circular dependency so you stop:
Code: [Select]
A.tcz,B.tcz,C.tcz,D.tcz,E.tcz,B.tcz,
As you get to the end of each .dep file, you start unwinding the
flattened list by removing the last entry in the list.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1832
Re: script that generates .tree files
« Reply #18 on: April 08, 2026, 08:34:48 AM »
Up to version 1.2 of treegen.awk, the script tries to close every .dep file it tried to open, whether or not that .dep file existed.

This updated version only tries to close .dep files that were actually opened. It's not any faster, just more correct. I also moved LEVEL++ to a more human-friendly place.

Code: [Select]
#!/usr/bin/awk -f                                                                                                                     
                                                                                                                                       
# treegen.awk v2.0 (April 8, 2026)                                                                                                     
# usage example: $ treegen.awk labwc.tcz                                                                                               
                                                                                                                                       
BEGIN {                                                                                                                               
    LEVEL = 0                                                                                                                         
    LINUX_VERSION = "6.18.2-tinycore64"                                                                                               
    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"                                                                       
    get_dependencies(ARGV[1])                                                                                                         
}                                                                                                                                     
                                                                                                                                       
function get_dependencies(app,    depapp) {                                                                                           
    LEVEL++                                                                                                                           
    gsub(/KERNEL/,LINUX_VERSION,app)                                                                                                   
    for (i = 1; i < LEVEL; i++)                                                                                                       
        printf("   ")                                                                                                                 
    printf("%s\n", app)                                                                                                               
    if (getline depapp <(MIRROR_PATH app ".dep") > 0) {                                                                               
        do {                                                                                                                           
            if (depapp ~ /\.tcz/) # because some .dep files have blank lines                                                         
                get_dependencies(depapp)                                                                                           
        } while (getline depapp <(MIRROR_PATH app ".dep") > 0)                                                                         
        close(MIRROR_PATH app ".dep")                                                                                                 
    }                                                                                                                                 
    LEVEL--                                                                                                                           
}   

Sorry for spamming the forum with this, but now that it's out there I feel responsible for it being correct and beautiful :)

I think this is as good as I can make it. If anyone finds a bug or an optimization that I missed, please do share. Happy hacking!                             
« Last Edit: April 08, 2026, 08:40:42 AM by GNUser »

Online patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 837
Re: script that generates .tree files
« Reply #19 on: April 08, 2026, 10:07:08 AM »
You don't "spamming" the forum, You’re also teaching the community a lot.
You could also doing this into a new page in the wiki and link your thread to it.

Somebody may have a use for your script, also for learning the script capabilities with Linux.
You could also test this with another script languages like lua or ruby or even python.
But then you would also add that to the depending list  :( .

Online Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12712
Re: script that generates .tree files
« Reply #20 on: April 23, 2026, 02:57:10 PM »
Hi GNUser
Here's final tally of busybox ash vs. busybox awk vs. C++ implementations of treegen. All three implementations parse .dep files present on the local mirror (without using network/wget):

                                      busybox ash     busybox awk          C++
time to generate labwc-dev.tcz.tree:    11 sec          0.43 sec         0.06 sec
time to generate vlc-dev.tcz.tree:      49 sec          1.86 sec         0.31 sec
...
I decided to try my hand at a busybox ash version.

Using your busybox awk as a benchmark, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 8.49s
user    0m 3.56s
sys     0m 4.92s

Using my busybox ash version, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./TreeGen.sh vlc-dev
6.25
real    0m 6.27s
user    0m 3.71s
sys     0m 2.52s

That surprised me, but not as much as this did:
Code: [Select]
tc@E310:~/TreeGen$ ls -l vlc*
-rw-r--r-- 1 tc staff 1879760 Apr 23 00:56 vlc-dev.tcz.tree <------ ash version
-rw-r--r-- 1 tc staff 1839796 Mar 30 13:48 vlc-dev.tcz.tree.bak <-- repo version
-rw-r--r-- 1 tc staff 1839795 Apr 22 22:53 vlc-dev.tcz.tree2 <----- awk version
The repo versio is 1 byte longer than yours. It has an extra newline at the end of the file.
But mine is almost 40,000 bytes larger.

And about 900 lines longer:
Code: [Select]
tc@E310:~/TreeGen$ wc -l vlc-dev.tcz.tree*
  41437 vlc-dev.tcz.tree <------ ash version
  40528 vlc-dev.tcz.tree.bak <-- repo version
  40527 vlc-dev.tcz.tree2 <----- awk version

At first I figured there was a bug in my version because yours agreed with the repo version.
Then I found this:

Repo version:
Code: [Select]
   3498                   libsndfile-dev.tcz
   3499                      libsndfile.tcz
   3500                         flac.tcz
   3501                            libogg.tcz
   3502                         libvorbis.tcz
   3503                            libogg.tcz
   3504                         opus.tcz
   3505                         libmpg123.tcz
   3506                         lame.tcz
   3507                      flac-dev.tcz
   3508                      libvorbis-dev.tcz   
   3509                   libbluetooth-dev.tcz

My version:
Code: [Select]
   3498                   libsndfile-dev.tcz
   3499                      libsndfile.tcz
   3500                         flac.tcz
   3501                            libogg.tcz
   3502                         libvorbis.tcz
   3503                            libogg.tcz
   3504                         opus.tcz
   3505                         libmpg123.tcz
   3506                         lame.tcz
   3507                      flac-dev.tcz
   3508                         flac.tcz
   3509                            libogg.tcz
   3510                         libogg-dev.tcz
   3511                            libogg.tcz
   3512                      libvorbis-dev.tcz
   3513                         libvorbis.tcz
   3514                            libogg.tcz
   3515                         libogg-dev.tcz
   3516                            libogg.tcz
   3517                   libbluetooth-dev.tcz

The C++ and awk versions are not parsing flac-dev.tcz.dep and libvorbis-dev.tcz.dep

Here's the cause:
Code: [Select]
tc@E310:~/TreeGen$ grep " " DepFiles/libsndfile-dev.tcz.dep | tr " " "_"
flac-dev.tcz_
libvorbis-dev.tcz___
The trailing spaces on those 2 entries are tripping up the C++ and awk programs.

Checking the rest of the .dep files related to vlc-dev turn up this:
Code: [Select]
tc@E310:~/TreeGen$ grep " " DepFiles/* | tr " " "_"
DepFiles/acl.tcz.dep:_
DepFiles/libsane.tcz.dep:libavahi.tcz_____
DepFiles/libsndfile-dev.tcz.dep:flac-dev.tcz_
DepFiles/libsndfile-dev.tcz.dep:libvorbis-dev.tcz___
DepFiles/libsndfile.tcz.dep:libmpg123.tcz_
DepFiles/sane-dev.tcz.dep:avahi-dev.tcz__
DepFiles/taglib-dev.tcz.dep:taglib.tcz___________________________________
There are a handful of entries with trailing spaces.

So running treegen.awk against libsndfile-dev.tcz:
Code: [Select]
tc@E310:~/TreeGen$ ./treegen.awk libsndfile-dev.tcz
libsndfile-dev.tcz
   libsndfile.tcz
      flac.tcz
         libogg.tcz
      libvorbis.tcz
         libogg.tcz
      opus.tcz
      libmpg123.tcz
      lame.tcz
   flac-dev.tcz
   libvorbis-dev.tcz
It reports flac-dev.tcz and libvorbis-dev.tcz have no dependencies.

Running these commands suggestes they do:
Code: [Select]
tc@E310:~/TreeGen$ ./treegen.awk flac-dev.tcz
flac-dev.tcz
   flac.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz
tc@E310:~/TreeGen$ ./treegen.awk libvorbis-dev.tcz
libvorbis-dev.tcz
   libvorbis.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz

I've attached my version of TreeGen.sh.

The only variables you need to adjust are kernel version and
path to your local repo.
Code: [Select]
Kernel="6.18.2-tinycore64"
Repo="DepFiles/"

Run the program like this:
Code: [Select]
./TreeGen.sh FilenameThe script adds .tcz automatically if you omit it.
The script automatically saves results to Filename.tcz.tree in
the current directory.

Online gadget42

  • Hero Member
  • *****
  • Posts: 1032
** WARNING: connection is not using a post-quantum kex exchange algorithm.
** This session may be vulnerable to "store now, decrypt later" attacks.
** The server may need to be upgraded. See https://openssh.com/pq.html
** Also see: post quantum internet 2025 - https://blog.cloudflare.com/pq-2025/