Tiny Core Linux

General TC => General TC Talk => Topic started by: GNUser on April 02, 2026, 09:48:09 AM

Title: [Solved] script that generates .tree files
Post by: GNUser on April 02, 2026, 09:48:09 AM
Would one of the developers be able to please share the script that generates the .tree files on the official repo? While I could reimplement it myself, I'm trying to some save time and energy.

P.S. When submitting new extensions for the official repo, I sometimes also have to create and submit build dependencies. Being able to generate the .tree files of all of these new extensions on my local mirror--before submitting anything--would be useful to me. For example, tce-size, which depends on the .tree files, gives valuable information that can help me decide if an extension is too bloated to be worth using/submitting.
Title: Re: script that generates .tree files
Post by: GNUser on April 02, 2026, 10:36:01 AM
I found the script in one of my own posts ::)
https://forum.tinycorelinux.net/index.php/topic,26040.msg168124.html#msg168124

When I use the  deptree  script with the  -d  flag, the output exactly matches the .tree format.

Sorry for the noise.
Title: Re: script that generates .tree files
Post by: Paul_123 on April 02, 2026, 05:42:59 PM
We use a C program now, most of the recent changes has been in circular dependency monitoring, so it doesn’t fill up the disk space.  But otherwise it doesn’t do much more that the shell script.
Title: Re: script that generates .tree files
Post by: GNUser on April 02, 2026, 06:35:45 PM
Hi Paul_123. May I please have a copy of the C program? I'd prefer it over the shell script.
Title: Re: script that generates .tree files
Post by: Paul_123 on April 03, 2026, 08:14:24 AM
sure,...  I'll get it for you.  next time I can login to the repo server.
Title: Re: script that generates .tree files
Post by: GNUser on April 03, 2026, 08:30:13 AM
Thank you :)
Title: Re: script that generates .tree files
Post by: Paul_123 on April 03, 2026, 06:24:58 PM
Sorry,  its cpp.  here you go.  I also included the script that calls treegen.
Title: Re: script that generates .tree files
Post by: Rich on April 03, 2026, 08:03:48 PM
Hi GNUser
I modified the attached  gen_tcz_trees_17.x  file so the path
on the server would not be exposed.

You'll need to modify this to match your system if you want to run it:
Code: [Select]
cd $REPO || exit 1
There are also lines that call out the kernel version for TC17.
Title: Re: script that generates .tree files
Post by: GNUser on April 03, 2026, 08:06:37 PM
Thank you very much, Paul_123. I'm in your debt.

Got it, Rich. Thanks.
Title: Re: script that generates .tree files
Post by: GNUser on April 03, 2026, 08:45:42 PM
I did a little test. Not surprisingly, the cpp version is over 500x faster than the shell script!

Code: [Select]
$ time sh -c "for i in labwc-config labwc-dev labwc-menu-generator labwc sfwbar foot appindicator-broker dino; do treegen $i $(uname -r) >$i.tcz.tree; done"
real 0m 0.03s
user 0m 0.01s
sys 0m 0.00s

$ time sh -c "for i in labwc-config labwc-dev labwc-menu-generator labwc sfwbar foot appindicator-broker dino; do deptree -d $i >$i.tcz.tree; done"
real 0m 16.11s
user 0m 11.03s
sys 0m 4.37s
Title: Re: script that generates .tree files
Post by: GNUser on April 06, 2026, 01:13:16 PM
I wrote my own tiny versions, just for my edification. First I wrote one in shell, then rewrote it in awk.

The awk version is ~40x faster than the shell version, so I won't bother to post the shell version.

Here's my  treegen.awk  in case anybody finds it helpful. Caveat: It is extremely minimalistic, so there are no sanity checks for circular dependencies or the like.

Code: [Select]
#!/usr/bin/awk -f

# usage example: $ treegen.awk labwc.tcz

BEGIN {
LEVEL = 0
LINUX_VERSION = "6.18.2-tinycore64"
MIRROR_PATH = "/path/to/your/local/tinycorelinux/17.x/x86_64/tcz/"
get_dependencies(ARGV[1])
}

function get_dependencies(app,    depapp) {
gsub(/KERNEL/,LINUX_VERSION,app)
for (i = 0; i < LEVEL; i++)
printf("   ")
print app
while (getline depapp < MIRROR_PATH app ".dep" > 0) {
# if (depapp ~ /\.tcz/) is there because some .dep files have blank lines.
if (depapp ~ /\.tcz/) {
LEVEL++
get_dependencies(depapp)
}
}
close(MIRROR_PATH app ".dep")
LEVEL--
}

    [Edit]: Added comment from next post to code.  Rich
Title: Re: script that generates .tree files
Post by: GNUser on April 06, 2026, 01:46:58 PM
I should have put in a comment:
if (depapp ~ /\.tcz/) is there because some .dep files have blank lines.
Title: Re: script that generates .tree files
Post by: Rich on April 06, 2026, 03:04:30 PM
Hi GNUser
Comment added to code. :)
Title: Re: script that generates .tree files
Post by: GNUser on April 06, 2026, 08:08:02 PM
Thanks, Rich.

Here's final tally of busybox ash vs. busybox awk vs. C++ implementations of treegen. All three implementations parse .dep files present on the local mirror (without using network/wget):

                                      busybox ash     busybox awk          C++
time to generate labwc-dev.tcz.tree:    11 sec          0.43 sec         0.06 sec
time to generate vlc-dev.tcz.tree:      49 sec          1.86 sec         0.31 sec


The fact that that a compiled executable (the C++ implementation) crushes the two scripts doesn't surprise me. What does surprise me is the vast difference in performance between the shell script and the awk script--especially considering that the logic and number of operations in the two scripts is exactly the same.

    [Edit]: Fixed formatting (alignment) of timing table.  Rich
Title: Re: script that generates .tree files
Post by: Rich on April 06, 2026, 08:37:46 PM
Hi GNUser
... What does surprise me is the vast difference in performance between the shell script and the awk script--especially considering that the logic and number of operations in the two scripts is exactly the same.

I'd like to address that by quoting myself from another thread we collaborated on:
... Given how quickly GNU awk is able to sort provides.db, I'd say this problem is more than solved. The problem is crushed.
There's a reason roberts liked to inject awk snippets into his scripts. When it
comes to data manipulation, it can be wicked fast.

I've had a few instances were I found the execution time of a script unacceptable
and was forced to add an awk function. None of my techniques could even touch
the speed of awk.

I would not be surprised if GNU awk was even faster.
Title: Re: script that generates .tree files
Post by: GNUser on April 06, 2026, 09:11:39 PM
Hi, Rich. That was a good thread! It's what got me interested in awk. "Programming in AWK" has since become my favorite programming book :)

I just never did any actual awk vs. shell benchmarking until now. The result is more dramatic than I expected.

I will see what GNU awk can do.
Title: Re: script that generates .tree files
Post by: GNUser on April 07, 2026, 11:29:07 AM
Hi Rich. I changed the sole  print  statement  in the script to  printf  and was able to achieve a small but real increase in speed.

Also, I discovered that adding a set of parenthesis around the concatenation operation at the  getline  step is required for the script to work with gawk.

So here is the final version of the script:

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v1.2 (April 7, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
LEVEL = 0
LINUX_VERSION = "6.18.2-tinycore64"
MIRROR_PATH = "/path/to/your/local/mirror/tinycorelinux/17.x/x86_64/tcz/"
get_dependencies(ARGV[1])
}

function get_dependencies(app,    depapp) {
gsub(/KERNEL/,LINUX_VERSION,app)
for (i = 0; i < LEVEL; i++)
printf("   ")
printf("%s\n", app)
while (getline depapp <(MIRROR_PATH app ".dep") > 0) {
if (depapp ~ /\.tcz/) { # because some .dep files have blank lines
LEVEL++
get_dependencies(depapp)
}
}
close(MIRROR_PATH app ".dep")
LEVEL--
}

To use GNU awk rather than Busybox awk, one simply needs to load gawk and change the shebang from  #!/usr/bin/awk -f  to  #!/usr/local/bin/gawk -f

I don't have a lot of time to tinker today, so for the Busybox awk vs. GNU awk benchmarking I simply ran this command 10 times for each awk version:

Code: [Select]
$ time treegen.awk vlc-dev.tcz
then took the mean of the real time shown in the output. Here is the somewhat surprising result:

Busybox awk: 1.52 sec
GNU awk: 1.61 sec


I'm not sure whether the difference is statistically significant. I generally prefer to do as much as possible with just what's included in the base system, so I'm happy that Busybox is at least as fast as GNU awk if not slightly faster for this particular task!

P.S. treegen.awk is now in the public domain. It would be trivial to use a variable to count the number of times the get_dependencies function has been called, and bail out if that variable exceeds some appropriately large number. This could be used as a poor man's test for circular dependencies.
Title: Re: script that generates .tree files
Post by: Rich on April 07, 2026, 02:55:29 PM
Hi GNUser
I reread that thread I referenced and realized I misremembered
what I thought was a faster GNU awk. It was a much faster GNU
 sort  that I mentioned in that thread.

I think the minor speed difference between awk and gawk is just noise.

... count the number of times the get_dependencies function has been called, and bail out if that variable exceeds some appropriately large number. This could be used as a poor man's test for circular dependencies.
A better way is to build a flattened version as you go:
Code: [Select]
A.tcz,B.tcz,C.tcz,D.tcz,E.tczBefore appending the next tcz to the flattened list, check if it exists.
In this example, B.tcz creates a circular dependency so you stop:
Code: [Select]
A.tcz,B.tcz,C.tcz,D.tcz,E.tcz,B.tcz,
As you get to the end of each .dep file, you start unwinding the
flattened list by removing the last entry in the list.
Title: Re: script that generates .tree files
Post by: GNUser on April 08, 2026, 08:34:48 AM
Up to version 1.2 of treegen.awk, the script tries to close every .dep file it tried to open, whether or not that .dep file existed.

This updated version only tries to close .dep files that were actually opened. It's not any faster, just more correct. I also moved LEVEL++ to a more human-friendly place.

Code: [Select]
#!/usr/bin/awk -f                                                                                                                     
                                                                                                                                       
# treegen.awk v2.0 (April 8, 2026)                                                                                                     
# usage example: $ treegen.awk labwc.tcz                                                                                               
                                                                                                                                       
BEGIN {                                                                                                                               
    LEVEL = 0                                                                                                                         
    LINUX_VERSION = "6.18.2-tinycore64"                                                                                               
    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"                                                                       
    get_dependencies(ARGV[1])                                                                                                         
}                                                                                                                                     
                                                                                                                                       
function get_dependencies(app,    depapp) {                                                                                           
    LEVEL++                                                                                                                           
    gsub(/KERNEL/,LINUX_VERSION,app)                                                                                                   
    for (i = 1; i < LEVEL; i++)                                                                                                       
        printf("   ")                                                                                                                 
    printf("%s\n", app)                                                                                                               
    if (getline depapp <(MIRROR_PATH app ".dep") > 0) {                                                                               
        do {                                                                                                                           
            if (depapp ~ /\.tcz/) # because some .dep files have blank lines                                                         
                get_dependencies(depapp)                                                                                           
        } while (getline depapp <(MIRROR_PATH app ".dep") > 0)                                                                         
        close(MIRROR_PATH app ".dep")                                                                                                 
    }                                                                                                                                 
    LEVEL--                                                                                                                           
}   

Sorry for spamming the forum with this, but now that it's out there I feel responsible for it being correct and beautiful :)

I think this is as good as I can make it. If anyone finds a bug or an optimization that I missed, please do share. Happy hacking!                             
Title: Re: script that generates .tree files
Post by: patrikg on April 08, 2026, 10:07:08 AM
You don't "spamming" the forum, You’re also teaching the community a lot.
You could also doing this into a new page in the wiki and link your thread to it.

Somebody may have a use for your script, also for learning the script capabilities with Linux.
You could also test this with another script languages like lua or ruby or even python.
But then you would also add that to the depending list  :( .
Title: Re: script that generates .tree files
Post by: Rich on April 23, 2026, 02:57:10 PM
Hi GNUser
Here's final tally of busybox ash vs. busybox awk vs. C++ implementations of treegen. All three implementations parse .dep files present on the local mirror (without using network/wget):

                                      busybox ash     busybox awk          C++
time to generate labwc-dev.tcz.tree:    11 sec          0.43 sec         0.06 sec
time to generate vlc-dev.tcz.tree:      49 sec          1.86 sec         0.31 sec
...
I decided to try my hand at a busybox ash version.

Using your busybox awk as a benchmark, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 8.49s
user    0m 3.56s
sys     0m 4.92s

Using my busybox ash version, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./TreeGen.sh vlc-dev
6.25
real    0m 6.27s
user    0m 3.71s
sys     0m 2.52s

That surprised me, but not as much as this did:
Code: [Select]
tc@E310:~/TreeGen$ ls -l vlc*
-rw-r--r-- 1 tc staff 1879760 Apr 23 00:56 vlc-dev.tcz.tree <------ ash version
-rw-r--r-- 1 tc staff 1839796 Mar 30 13:48 vlc-dev.tcz.tree.bak <-- repo version
-rw-r--r-- 1 tc staff 1839795 Apr 22 22:53 vlc-dev.tcz.tree2 <----- awk version
The repo versio is 1 byte longer than yours. It has an extra newline at the end of the file.
But mine is almost 40,000 bytes larger.

And about 900 lines longer:
Code: [Select]
tc@E310:~/TreeGen$ wc -l vlc-dev.tcz.tree*
  41437 vlc-dev.tcz.tree <------ ash version
  40528 vlc-dev.tcz.tree.bak <-- repo version
  40527 vlc-dev.tcz.tree2 <----- awk version

At first I figured there was a bug in my version because yours agreed with the repo version.
Then I found this:

Repo version:
Code: [Select]
   3498                   libsndfile-dev.tcz
   3499                      libsndfile.tcz
   3500                         flac.tcz
   3501                            libogg.tcz
   3502                         libvorbis.tcz
   3503                            libogg.tcz
   3504                         opus.tcz
   3505                         libmpg123.tcz
   3506                         lame.tcz
   3507                      flac-dev.tcz
   3508                      libvorbis-dev.tcz   
   3509                   libbluetooth-dev.tcz

My version:
Code: [Select]
   3498                   libsndfile-dev.tcz
   3499                      libsndfile.tcz
   3500                         flac.tcz
   3501                            libogg.tcz
   3502                         libvorbis.tcz
   3503                            libogg.tcz
   3504                         opus.tcz
   3505                         libmpg123.tcz
   3506                         lame.tcz
   3507                      flac-dev.tcz
   3508                         flac.tcz
   3509                            libogg.tcz
   3510                         libogg-dev.tcz
   3511                            libogg.tcz
   3512                      libvorbis-dev.tcz
   3513                         libvorbis.tcz
   3514                            libogg.tcz
   3515                         libogg-dev.tcz
   3516                            libogg.tcz
   3517                   libbluetooth-dev.tcz

The C++ and awk versions are not parsing flac-dev.tcz.dep and libvorbis-dev.tcz.dep

Here's the cause:
Code: [Select]
tc@E310:~/TreeGen$ grep " " DepFiles/libsndfile-dev.tcz.dep | tr " " "_"
flac-dev.tcz_
libvorbis-dev.tcz___
The trailing spaces on those 2 entries are tripping up the C++ and awk programs.

Checking the rest of the .dep files related to vlc-dev turn up this:
Code: [Select]
tc@E310:~/TreeGen$ grep " " DepFiles/* | tr " " "_"
DepFiles/acl.tcz.dep:_
DepFiles/libsane.tcz.dep:libavahi.tcz_____
DepFiles/libsndfile-dev.tcz.dep:flac-dev.tcz_
DepFiles/libsndfile-dev.tcz.dep:libvorbis-dev.tcz___
DepFiles/libsndfile.tcz.dep:libmpg123.tcz_
DepFiles/sane-dev.tcz.dep:avahi-dev.tcz__
DepFiles/taglib-dev.tcz.dep:taglib.tcz___________________________________
There are a handful of entries with trailing spaces.

So running treegen.awk against libsndfile-dev.tcz:
Code: [Select]
tc@E310:~/TreeGen$ ./treegen.awk libsndfile-dev.tcz
libsndfile-dev.tcz
   libsndfile.tcz
      flac.tcz
         libogg.tcz
      libvorbis.tcz
         libogg.tcz
      opus.tcz
      libmpg123.tcz
      lame.tcz
   flac-dev.tcz
   libvorbis-dev.tcz
It reports flac-dev.tcz and libvorbis-dev.tcz have no dependencies.

Running these commands suggestes they do:
Code: [Select]
tc@E310:~/TreeGen$ ./treegen.awk flac-dev.tcz
flac-dev.tcz
   flac.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz
tc@E310:~/TreeGen$ ./treegen.awk libvorbis-dev.tcz
libvorbis-dev.tcz
   libvorbis.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz

I've attached my version of TreeGen.sh.

The only variables you need to adjust are kernel version and
path to your local repo.
Code: [Select]
Kernel="6.18.2-tinycore64"
Repo="DepFiles/"

Run the program like this:
Code: [Select]
./TreeGen.sh FilenameThe script adds .tcz automatically if you omit it.
The script automatically saves results to Filename.tcz.tree in
the current directory.
Title: Re: script that generates .tree files
Post by: gadget42 on April 25, 2026, 09:23:00 AM
TIL - Today I Learned - today i learned there is a lot to learn about posix compliant filenames...hahaha...ahem...sigh.

using https://start.duckduckgo.com/lite/ and searching "linux "trailing spaces" in file names causes problems"(outer quotation marks NOT used/searched)

these were the seven results(as of a few minutes before posting):
https://dwheeler.com/essays/fixing-unix-linux-filenames.html
https://askubuntu.com/questions/621007/problem-with-spaces-in-file-names
https://askubuntu.com/questions/668088/remove-leading-or-trailing-spaces-in-file-or-folder-names
https://stackoverflow.com/questions/2304221/what-character-sequence-should-i-not-allow-in-a-filename
https://stackoverflow.com/questions/11210126/bash-find-files-with-trailing-spaces-at-the-end-of-the-lines
https://stackoverflow.com/questions/59065909/detect-remove-leading-space-in-file-or-folder-name
https://web.synametrics.com/syncrify-xtra-spaces.htm
Title: Re: script that generates .tree files
Post by: Rich on April 25, 2026, 03:59:52 PM
Hi GNUser
... I think this is as good as I can make it. If anyone finds a bug ...
Well, I pointed out a bug in my last post (not processing filenames with trailing spaces).
I'm not well versed in awk, but I suspect it may be related to this line:
Code: [Select]
            if (depapp ~ /\.tcz/) # because some .dep files have blank lines
Quote
... or an optimization that I missed, please do share. ...
... I decided to try my hand at a busybox ash version.

Using your busybox awk as a benchmark, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 8.49s
user    0m 3.56s
sys     0m 4.92s

Using my busybox ash version, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./TreeGen.sh vlc-dev
6.25
real    0m 6.27s
user    0m 3.71s
sys     0m 2.52s
...

I did find a couple of optimizations.
Change this:
Code: [Select]
    for (i = 1; i < LEVEL; i++)
        printf("   ")
To this:
Code: [Select]
    Width=(LEVEL - 1) * 3
        printf("%"Width"s")
And the execution time dropped to:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.95s
user    0m 1.71s
sys     0m 1.22s

Taking it a step further, you can now combine the 2 consecutive printf statements.
Change this:
Code: [Select]
    Width=(LEVEL - 1) * 3
        printf("%"Width"s")
    printf("%s\n", app)
To this:
Code: [Select]
    Width=(LEVEL - 1) * 3
        printf("%"Width"s%s\n", "", app)
And you get a further reduction in execution time:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.44s
user    0m 1.45s
sys     0m 0.94s
Title: Re: script that generates .tree files
Post by: Rich on April 25, 2026, 08:22:09 PM
Hi GNUser
Found one more minor optimization.
Changed  LEVEL++  to  LEVEL+=3  and moved it down after
the  printf  statement.
Changed  LEVEL--  to  LEVEL-=3.
Removed  Width=(LEVEL - 1) * 3  and used  LEVEL  in place of  Width.

The final implementation looks like this:
Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v2.0 (April 8, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    LEVEL = 0
    LINUX_VERSION = "6.18.2-tinycore64"
#    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    MIRROR_PATH = "/home/tc/TreeGen/DepFiles/"
    get_dependencies(ARGV[1])
}

function get_dependencies(app,    depapp) {
    gsub(/KERNEL/,LINUX_VERSION,app)
        printf("%"LEVEL"s%s\n", "", app)
    LEVEL+=3
    if (getline depapp <(MIRROR_PATH app ".dep") > 0) {
        do {
            if (depapp ~ /\.tcz/) # because some .dep files have blank lines
                get_dependencies(depapp)
        } while (getline depapp <(MIRROR_PATH app ".dep") > 0)
        close(MIRROR_PATH app ".dep")
    }
    LEVEL-=3
}

Execution appears to be a tiny bit faster:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.33s
user    0m 1.47s
sys     0m 0.84s

I think that may be about all I can squeeze out of treegen.awk.
I'll leave updating the version number up to you if you decide
to incorporate any of these changes.
Title: Re: script that generates .tree files
Post by: GNUser on May 01, 2026, 10:44:34 PM
Hi Rich. Sorry, I did not notice these updates to the thread until now. The forum does not always send an email notification when there are new posts.

Thank you very much for finding this bug and the optimizations. I will post an updated version of treegen.awk on Sunday or Monday when I have some time to play with this.
Title: Re: script that generates .tree files
Post by: GNUser on May 02, 2026, 07:17:23 AM
Hi, Rich. I made a few changes:

1. Renamed some variables to (hopefully) make things more clear (e.g., LEVEL renamed to N_SPACES--by LEVEL I had meant recursion level, which is no longer what this variable's value represents).

2. I added a call to gsub to get rid of any trailing whitespace

We will lose some speed with the call go gsub, but it will be small because gsub is internal to awk.

Version bump to 3.0 :)

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.0 (May 2, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    N_SPACES = 0
    LINUX_VERSION = "6.18.2-tinycore64"
#    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    MIRROR_PATH = "/home/tc/TreeGen/DepFiles/"
    get_dependencies(ARGV[1])
}

function get_dependencies(tczname,    line) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
        printf("%"N_SPACES"s%s\n", "", tczname)
    N_SPACES+=3
    if (getline line <(MIRROR_PATH tczname ".dep") > 0) {
        do {
            if (line ~ /\.tcz/) # because some .dep files have blank lines
                gsub(/\.tcz[ \t]+$/, ".tcz", line) # because some lines in .dep files have trailing whitespace
                get_dependencies(line)
        } while (getline line <(MIRROR_PATH tczname ".dep") > 0)
        close(MIRROR_PATH tczname ".dep")
    }
    N_SPACES-=3
}



Title: Re: script that generates .tree files
Post by: Rich on May 02, 2026, 08:37:10 AM
Hi GNUser
Well, you're almost there. :)
For testing, I saved your new version as treegen3.awk.

... We will lose some speed with the call go gsub, but it will be small because gsub is internal to awk. ...
No big speed penalty:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen2.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.35s
user    0m 1.45s
sys     0m 0.89s
tc@E310:~/TreeGen$ time ./treegen3.awk vlc-dev.tcz > vlc-dev.tcz.tree3
real    0m 2.50s
user    0m 1.56s
sys     0m 0.92s

However, your tree file is now about 8000 bytes bigger than mine:
Code: [Select]
tc@E310:~/TreeGen$ ls -l vlc*
-rw-r--r-- 1 tc staff 1879760 Apr 25 14:08 vlc-dev.tcz.tree <------ ash version
-rw-r--r-- 1 tc staff 1839796 Mar 30 13:48 vlc-dev.tcz.tree.bak <-- repo version
-rw-r--r-- 1 tc staff 1839795 May  2 07:35 vlc-dev.tcz.tree2
-rw-r--r-- 1 tc staff 1887792 May  2 07:35 vlc-dev.tcz.tree3 <----- awk version

And 258 lines longer:
Code: [Select]
tc@E310:~/TreeGen$ wc -l vlc-dev.tcz.tree*
  41437 vlc-dev.tcz.tree
  40528 vlc-dev.tcz.tree.bak
  40527 vlc-dev.tcz.tree2
  41695 vlc-dev.tcz.tree3
 164187 total
tc@E310:~/TreeGen$ calc 41695-41437
258

Searching for empty lines:
Code: [Select]
tc@E310:~/TreeGen$ busybox grep -c '^$' vlc-dev.tcz.tree3
0

Searching for lines containing only whitespace:
Code: [Select]
tc@E310:~/TreeGen$ busybox grep -c '^[[:blank:]]\+$' vlc-dev.tcz.tree3
258

Searching for lines containing only spaces:
Code: [Select]
tc@E310:~/TreeGen$ busybox grep -c '^[[:space:]]\+$' vlc-dev.tcz.tree3
258

So some of those "blank" lines contain a bunch of space characters.

My awk skills are weak at best, so I will offer a generic suggestion.
This is a command I use when I want to reduce a tree file to a
sorted list of unique file names:
Code: [Select]
awk '$1=$1' FileIn | sort -u > FileOutThe interesting part is  awk '$1=$1'  removes all whitespace.

If you can implement that on every line read, all you would
need to test for are lines with a length of zero characters.

File names can't contain whitespace, so they would not be touched.
Leading and trailing whitespace would be removed.
Lines containing only whitespace would become empty (zero length).
Title: Re: script that generates .tree files
Post by: GNUser on May 02, 2026, 04:12:46 PM
Hi Rich. Yes, I can reproduce the blank lines in the output of treegen.awk v3.0. I should have tested it more ::)
I'm working on a new version now.
Title: Re: script that generates .tree files
Post by: GNUser on May 02, 2026, 04:43:48 PM
Hi Rich. My idea was correct, but the syntax was wrong (a set of braces were missing). Sorry about that. I think this is what we want:

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.1 (May 2, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    N_SPACES = 0
    LINUX_VERSION = "6.18.2-tinycore64"
#    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    MIRROR_PATH = "/home/tc/TreeGen/DepFiles/"
    get_dependencies(ARGV[1])
}

function get_dependencies(tczname,    line) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
        printf("%"N_SPACES"s%s\n", "", tczname)
    N_SPACES+=3
    if (getline line <(MIRROR_PATH tczname ".dep") > 0) {
        do {
            if (line ~ /\.tcz/) { # because some .dep files have blank lines
                gsub(/\.tcz[ \t]+$/, ".tcz", line) # because some lines in .dep files have trailing whitespace
                get_dependencies(line)
            }
        } while (getline line <(MIRROR_PATH tczname ".dep") > 0)
        close(MIRROR_PATH tczname ".dep")
    }
    N_SPACES-=3
}
Can you please test this?

P.S. The '$1=$1' trick is quite handy but I don't see an obvious way to integrate it with treegen.awk's existing logic.
Title: Re: script that generates .tree files
Post by: Rich on May 02, 2026, 10:19:30 PM
Hi GNUser
I think we have a winner. ;D
I saved this version as treegen4.awk.

Sizes now match:
Code: [Select]
tc@E310:~/TreeGen$ ls -l vlc*
-rw-r--r-- 1 tc staff 1879760 Apr 25 14:08 vlc-dev.tcz.tree <------ ash version
-rw-r--r-- 1 tc staff 1839796 Mar 30 13:48 vlc-dev.tcz.tree.bak <-- repo version
-rw-r--r-- 1 tc staff 1839795 May  2 21:59 vlc-dev.tcz.tree2
-rw-r--r-- 1 tc staff 1887792 May  2 07:35 vlc-dev.tcz.tree3
-rw-r--r-- 1 tc staff 1879760 May  2 21:59 vlc-dev.tcz.tree4 <----- awk version

Number of lines now match:
Code: [Select]
tc@E310:~/TreeGen$ wc -l vlc-dev.tcz.tree*
  41437 vlc-dev.tcz.tree
  40528 vlc-dev.tcz.tree.bak
  40527 vlc-dev.tcz.tree2
  41695 vlc-dev.tcz.tree3
  41437 vlc-dev.tcz.tree4

Execution time still looks good:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen2.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.35s
user    0m 1.54s
sys     0m 0.79s
tc@E310:~/TreeGen$ time ./treegen4.awk vlc-dev.tcz > vlc-dev.tcz.tree4
real    0m 2.47s
user    0m 1.59s
sys     0m 0.86s

Your original treegen.awk took 1.86 sec to create vlc-dev.tcz.tree vs my machines 8.49 sec.
So you are obviously running much faster hardware than I am.
Out of curiosity, how long does this latest version take on your hardware?

... P.S. The '$1=$1' trick is quite handy but I don't see an obvious way to integrate it with treegen.awk's existing logic.
Yeah, I tried also but got nowhere.
Title: Re: script that generates .tree files
Post by: GNUser on May 03, 2026, 06:55:33 AM
I think we have a winner. ;D
Hi Rich. Good to hear. I'll be using this version myself :)

Out of curiosity, how long does this latest version take on your hardware?
Here you go:

Code: [Select]
$ time treegen vlc-dev.tcz >/tmp/vlc-dev.tcz.tree
real 0m 0.67s
user 0m 0.40s
sys 0m 0.22s
So 1.86 sec before, 0.67 sec now. Your optimizations help a whole lot: The new version gets the job done in less than half the time, even with the call to gsub.

Thank you for your collaboration with this. Always a pleasure.
Title: Re: script that generates .tree files
Post by: Rich on May 03, 2026, 08:39:16 AM
Hi GNUser
Beautiful.

Overall, I'd say your script stacks up quite well against the C++ version:
Code: [Select]
awk C++
     Size     ~900 bytes      ~25K bytes
Time to generate vlc-dev.tcz.tree     0.67 Secs.      0.31 Secs.
Title: Re: script that generates .tree files
Post by: Paul_123 on May 03, 2026, 09:55:18 AM
For processing hundreds of tree files, I’ll take the speed of C++, but was there a robustness change needed for the repo, related to spaces and/or blank lines?
Title: Re: script that generates .tree files
Post by: GNUser on May 03, 2026, 08:01:39 PM
was there a robustness change needed for the repo, related to spaces and/or blank lines?
Hi Paul_123. Yes, trailing whitespace in .dep files is causing problems for the c++ version of treegen. See Rich's analysis in Reply #20.
Title: Re: script that generates .tree files
Post by: Rich on May 03, 2026, 08:24:40 PM
Hi Paul_123
The C++ version does not recurse on entries that contain spaces.
From the TC17 x86_64 repo.
This is libsndfile-dev.tcz.dep with spaces converted to underscores:
Code: [Select]
rich@tcbox:~/libsndfile$ cat libsndfile-dev.tcz.dep | tr " " "_"
libsndfile.tcz
flac-dev.tcz_
libvorbis-dev.tcz___
The two -dev entries have trailing spaces.

This is the output of treegen:
Code: [Select]
rich@tcbox:~/libsndfile$ treegen libsndfile-dev.tcz 6.18.2-tinycore64
libsndfile-dev.tcz
   libsndfile.tcz
      flac.tcz
         libogg.tcz
      libvorbis.tcz
         libogg.tcz
      opus.tcz
      libmpg123.tcz
      lame.tcz
   flac-dev.tcz
   libvorbis-dev.tcz
The two dev entries get listed but not their dependencies.

Running treegen on those two entries shows they have dependencies:
Code: [Select]
rich@tcbox:~/libsndfile$ treegen flac-dev.tcz 6.18.2-tinycore64
flac-dev.tcz
   flac.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz

rich@tcbox:~/libsndfile$ treegen libvorbis-dev.tcz 6.18.2-tinycore64
libvorbis-dev.tcz
   libvorbis.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz
Title: Re: script that generates .tree files
Post by: Paul_123 on May 03, 2026, 10:51:30 PM
That's simple enough to fix.  we should just ignore spaces.  Leading/Trailing/ or just a blank line. (stray tabs or /r as well)

Change the "nukenewline" function to.

Code: [Select]
static void nukenewline_space(char buf[]) {
        unsigned src, dst;

        for (src = 0, dst = 0; buf[src] != '\0'; src++) {
                if (buf[src] == '\n')
                        break;
                if (isspace((unsigned char)buf[src]))
                        continue;
                buf[dst++] = buf[src];
        }
        buf[dst] = '\0';
}

Need to add ctype.h as an include too.
Title: Re: script that generates .tree files
Post by: Rich on May 04, 2026, 11:42:46 AM
Hi Paul_123
I think that looks OK.

If I'm reading it right:
1. You first test and break for a newline so that isspace can't remove it.
2. Then isspace is used to skip past any whitespace and increment the src index.
3. And the src index is copied to the dest index, then the dest index is incremented.
4. Repeat those steps until src index equals string terminator.
5. Write string terminator to dest index.

At that point I guess the calling function discards any strings containing only
a newline character.
Title: Re: script that generates .tree files
Post by: Paul_123 on May 04, 2026, 12:22:47 PM
If I'm reading it right:

Yes, When the newline is encountered, that is the end of processing.....no need to waste time with anything else.   Only non whitespace characters are left in the string.   If there was no actual white space encountered, then it copies data from the source index to the destination index.


At that point I guess the calling function discards any strings containing only
a newline character.

Yes, the calling function right after processing the strings for newline (and now space)   There is a check to make sure the string length is not less than 4 (.tcz)   
Title: Re: script that generates .tree files
Post by: Rich on May 04, 2026, 03:10:19 PM
Hi Paul_123
Sounds perfect. :)
Title: Re: script that generates .tree files
Post by: GNUser on May 05, 2026, 06:55:29 AM
Only non whitespace characters are left in the string.

Both Rich (with the '$1=$1' idea) and Paul_123 hint at a more general solution for whitespace. Noted.

I made a small change to treegen.awk: Rather than eliminating trailing whitespace, it now eliminates all whitespace.

The result: We only process lines that contain ".tcz", and only after we've stripped any whitespace.

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.2 (May 5, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    N_SPACES = 0
    LINUX_VERSION = "6.18.2-tinycore64"
    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    get_dependencies(ARGV[1])
}

function get_dependencies(tczname,    line) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
        printf("%"N_SPACES"s%s\n", "", tczname)
    N_SPACES+=3
    if (getline line <(MIRROR_PATH tczname ".dep") > 0) {
        do {
            if (line ~ /\.tcz/) { # because some .dep files have blank lines
                gsub(/[ \t\r]+/, "", line) # because some lines in .dep files have whitespace
                get_dependencies(line)
            }
        } while (getline line <(MIRROR_PATH tczname ".dep") > 0)
        close(MIRROR_PATH tczname ".dep")
    }
    N_SPACES-=3
}

I did some tests and this version is actually slightly faster than treegen.awk 3.1. I'm on C++'s heels now ;D

Code: [Select]
awk C++
     Size     ~900 bytes      ~25K bytes
Time to generate vlc-dev.tcz.tree     0.45 Secs.      0.31 Secs.
Title: Re: script that generates .tree files
Post by: Rich on May 05, 2026, 12:00:19 PM
Hi Paul_123
I just ran your updated treegen in my test directory and
it produced the tree file correctly.
Title: Re: script that generates .tree files
Post by: Paul_123 on May 05, 2026, 01:22:05 PM
Thanks for confirming.  I did some testing too.....it is live in the repo scripts.
Title: Re: script that generates .tree files
Post by: Paul_123 on May 05, 2026, 08:12:27 PM
Okay, I had time to play.  Here is what AI thought of the code. There were a couple of iterations, but I just pasted it into one list.  Based on my test, it cut the time in half on your vlc-dev.tcz test case

Quote
1) Eliminates the N_SPACES += 3 / N_SPACES -= 3 mutations around every recursive call. Depth is now a pure value passed down the call stack — simpler and less error-prone.

2) %*s takes the width from the argument, avoiding the "%"N_SPACES"s" string concatenation on every call. Minor, but cleaner.

3) Use while (getline ...) (avoid the do/while pattern)
Your v2 reads the first line with if (getline ...) and then uses do { ... } while (getline ...), which does an extra getline control-flow dance. A single while loop is simpler and a bit faster.

4) Trim first, then test; avoid regex when possible
Right now you test if (line ~ /\.tcz/) then trim whitespace with a regex gsub. If most lines are blank/whitespace, trimming first lets you do a cheap suffix test.

Also: line ~ /\.tcz/ is “contains .tcz anywhere”. If dependency lines are supposed to end with .tcz, checking the suffix is both stricter and cheaper.

In awk you can do a suffix check without regex:

length(line) >= 4 && substr(line, length(line)-3) == ".tcz"

5) Replace dynamic-width format string building
printf("%"N_SPACES"s%s\n", "", tczname) rebuilds the format string each call. Prefer %*s:

printf("%*s%s\n", N_SPACES, "", tczname)

6) Minor: precompute the dep filename once per call
Don’t concatenate MIRROR_PATH tczname ".dep" multiple times.

7) Add caching, don't keep reading the same depfile over and over.



Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.4 (May 5, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    LINUX_VERSION = "6.18.2-tinycore64"
    MIRROR_PATH   = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    get_dependencies(ARGV[1], 0)
}

# Cache:
#   loaded[pkg]    = 1 once pkg.dep has been read (even if empty/missing)
#   ndeps[pkg]     = number of deps found
#   deps[pkg, i]   = ith dep (1..ndeps[pkg])

function load_depfile(pkg,    depfile, line, n) {
    if (loaded[pkg]) return
    loaded[pkg] = 1

    depfile = MIRROR_PATH pkg ".dep"
    n = 0

    while (getline line < depfile > 0) {
        gsub(/[ \t\r]+/, "", line)
        if (line == "") continue

        # endswith ".tcz" is faster/stricter than /\.tcz/
        L = length(line)
        if (L >= 4 && substr(line, L-3) == ".tcz")
            deps[pkg, ++n] = line
    }

    close(depfile)
    ndeps[pkg] = n
}

function get_dependencies(tczname, depth,    i, n) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
    printf("%*s%s\n", depth, "", tczname)

    load_depfile(tczname)

    n = ndeps[tczname]
    for (i = 1; i <= n; i++) {
        get_dependencies(deps[tczname, i], depth + 3)
    }
}

The Awk does not have a circular dependancy check
Title: Re: script that generates .tree files
Post by: Rich on May 05, 2026, 10:11:50 PM
Hi Paul_123
Okay, I had time to play.  Here is what AI thought of the code. There were a couple of iterations, but I just pasted it into one list.  Based on my test, it cut the time in half on your vlc-dev.tcz test case

Quote
----- Snip -----
2) %*s takes the width from the argument, avoiding the "%"N_SPACES"s" string concatenation on every call. Minor, but cleaner.

 ----- Snip -----

5) Replace dynamic-width format string building
printf("%"N_SPACES"s%s\n", "", tczname) rebuilds the format string each call. Prefer %*s:

printf("%*s%s\n", N_SPACES, "", tczname)

 ----- Snip -----
...
The AI may prefer that syntax, but I used the other syntax for a reason.
Not all versions of awk understand  %*s , such as busybox awk in TC10 and TC14:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen6.awk vlc-dev.tcz > vlc-dev.tcz.tree6
awk: ./treegen6.awk:40: %*x formats are not supported
Command exited with non-zero status 1
After changing this:
Code: [Select]
printf("%*s%s\n", depth, "", tczname)To this:
Code: [Select]
printf("%"depth"s%s\n", "", tczname)It ran, and there was a very noticeable speed improvement.

Previous version:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen5.awk vlc-dev.tcz > vlc-dev.tcz.tree5
real    0m 2.47s
user    0m 1.47s
sys     0m 0.96s

New version:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen6.awk vlc-dev.tcz > vlc-dev.tcz.tree6
real    0m 1.43s
user    0m 0.95s
sys     0m 0.47s
Title: Re: script that generates .tree files
Post by: Paul_123 on May 05, 2026, 10:40:13 PM
I saw the Dynamic format, I had missed you purposely removed it.  If I had more time, I would have tested each one at a time.  The caching route was probable the biggest improvement.

Man your computer is slow :)
Title: Re: script that generates .tree files
Post by: Rich on May 06, 2026, 12:19:15 AM
Hi Paul_123
I saw the Dynamic format, I had missed you purposely removed it. ...
Technically, the formatting is still dynamic since it's controlled by
a variable. The implementation is just different to be compatible
with more versions of awk.

Quote
... Man your computer is slow :)
Thank you. It's plenty fast for most of my daily needs. Besides, its
slow speed makes it easier to spot when an optimization is really
effective, and not just noise.

This is a laptop with an external keyboard, monitor, and mouse that
I keep running for when I need a little more horsepower:
Code: [Select]
tc@HP-G62:~/TreeGen$ time ./treegen5.awk vlc-dev.tcz > vlc-dev.tcz.tree5
real    0m 1.03s
user    0m 0.74s
sys     0m 0.27s
tc@HP-G62:~/TreeGen$ time ./treegen6.awk vlc-dev.tcz > vlc-dev.tcz.tree6
real    0m 0.63s
user    0m 0.51s
sys     0m 0.11s

This is a desktop a buddy gave me because it was inadequate for newer
versions of Windows:
Code: [Select]
tc@box:~/TreeGen$ time ./treegen5.awk vlc-dev.tcz > vlc-dev.tcz.tree5
real    0m 0.61s
user    0m 0.46s
sys     0m 0.14s
tc@box:~/TreeGen$ time ./treegen6.awk vlc-dev.tcz > vlc-dev.tcz.tree6
real    0m 0.35s
user    0m 0.30s
sys     0m 0.04s
Title: Re: script that generates .tree files
Post by: GNUser on May 06, 2026, 10:35:30 AM
I hate to admit it, but the AI-generated version that Paul_123 shared is better than my version, and about twice as fast :-\ Using a cache to avoid reopening .dep files that have already been parsed is pretty clever. But it only worked for me (on TCL17 x86_64 with busybox awk) after I made Rich's edit:

After changing this:
Code: [Select]
printf("%*s%s\n", depth, "", tczname)To this:
Code: [Select]
printf("%"depth"s%s\n", "", tczname)

I think I've squeezed out all the fun there is to be had with this little project. Thank you, Rich and Paul_123, for your helpful feedback.

Rich, the thread can be safely marked as "Solved" :)
Title: Re: [Solved] script that generates .tree files
Post by: Rich on May 06, 2026, 11:35:28 AM
Hi GNUser
... But it only worked for me (on TCL17 x86_64 with busybox awk) after I made Rich's edit: ...
That's not surprising, considering I had to make those changes for TC10, TC14, and TC16.
The syntax the AI proposed is valid in ash and C. I know because I've used it.

Create this file:
Code: [Select]
#!/bin/sh
xyzzy=10
printf "%*s%s\n" "$xyzzy" "" "Hi there!"

Run it and you get:
Code: [Select]
tc@E310:~/TreeGen$ ./Printf
          Hi there!
tc@E310:~/TreeGen$

When I tried to use it to reduce the number of times you called printf, awk complained.
A search turned up the way around this is to replace the asterisk with the variables name
and add quote marks so the name isn't part of the formatting string, its value is.

Quote
... Rich, the thread can be safely marked as "Solved" :)
Done. ;D
Title: Re: [Solved] script that generates .tree files
Post by: GNUser on May 06, 2026, 11:36:12 AM
The AI version considers .dep file lines with length >= 4 to be relevant. I think this is an error. A line in .dep file with length == 4 (a line that contains, for example, nothing but ".tcz") should be ignored. I think testing for length > 4 is what we want.

Here's what I think is our final version:

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.5 (May 6, 2026)
# usage example: $ treegen.awk vlc-dev.tcz

BEGIN {
    LINUX_VERSION = "6.18.2-tinycore64"
    MIRROR_PATH   = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    get_dependencies(ARGV[1], 0)
}

# Cache:
#   loaded[pkg]    = 1 once pkg.dep has been read (even if empty/missing)
#   ndeps[pkg]     = number of deps found
#   deps[pkg, i]   = ith dep (1..ndeps[pkg])

function load_depfile(pkg,    depfile, line, n) {
    if (loaded[pkg]) return
    loaded[pkg] = 1

    depfile = MIRROR_PATH pkg ".dep"
    n = 0

    while (getline line < depfile > 0) {
        gsub(/[ \t\r]+/, "", line)
        if (line == "") continue

        # endswith ".tcz" is faster/stricter than /\.tcz/
        L = length(line)
        if (L > 4 && substr(line, L-3) == ".tcz")
            deps[pkg, ++n] = line
    }

    close(depfile)
    ndeps[pkg] = n
}

function get_dependencies(tczname, depth,    i, n) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
    printf("%"depth"s%s\n", "", tczname)

    load_depfile(tczname)

    n = ndeps[tczname]
    for (i = 1; i <= n; i++) {
        get_dependencies(deps[tczname, i], depth + 3)
    }
}
Title: Re: [Solved] script that generates .tree files
Post by: Paul_123 on May 06, 2026, 12:17:39 PM
These little exercises, while mundane in themselves, get you thinking, and learning. 
Title: Re: [Solved] script that generates .tree files
Post by: GNUser on May 06, 2026, 12:21:35 PM
These little exercises, while mundane in themselves, get you thinking, and learning.
Agreed. And that's what it's all about :)
Title: Re: [Solved] script that generates .tree files
Post by: GNUser on May 06, 2026, 12:25:37 PM
Paul_123, what AI tool did you use for the treegen.awk cleanup? Just curious.
Title: Re: [Solved] script that generates .tree files
Post by: Paul_123 on May 06, 2026, 12:33:05 PM
It was GitHub Copilot,  which kinda mixes up the models used based on the request type.

It was either GPT5.2 or Sonnet 4.6
Title: Re: [Solved] script that generates .tree files
Post by: GNUser on May 06, 2026, 12:45:07 PM
Got it. The cache idea and its implementation were surprisingly good. It's kind of frightening, actually.
Thank you.