WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: script that generates .tree files  (Read 1623 times)

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1841
Re: script that generates .tree files
« Reply #15 on: April 06, 2026, 09:11:39 PM »
Hi, Rich. That was a good thread! It's what got me interested in awk. "Programming in AWK" has since become my favorite programming book :)

I just never did any actual awk vs. shell benchmarking until now. The result is more dramatic than I expected.

I will see what GNU awk can do.
« Last Edit: April 06, 2026, 09:14:09 PM by GNUser »

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1841
Re: script that generates .tree files
« Reply #16 on: April 07, 2026, 11:29:07 AM »
Hi Rich. I changed the sole  print  statement  in the script to  printf  and was able to achieve a small but real increase in speed.

Also, I discovered that adding a set of parenthesis around the concatenation operation at the  getline  step is required for the script to work with gawk.

So here is the final version of the script:

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v1.2 (April 7, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
LEVEL = 0
LINUX_VERSION = "6.18.2-tinycore64"
MIRROR_PATH = "/path/to/your/local/mirror/tinycorelinux/17.x/x86_64/tcz/"
get_dependencies(ARGV[1])
}

function get_dependencies(app,    depapp) {
gsub(/KERNEL/,LINUX_VERSION,app)
for (i = 0; i < LEVEL; i++)
printf("   ")
printf("%s\n", app)
while (getline depapp <(MIRROR_PATH app ".dep") > 0) {
if (depapp ~ /\.tcz/) { # because some .dep files have blank lines
LEVEL++
get_dependencies(depapp)
}
}
close(MIRROR_PATH app ".dep")
LEVEL--
}

To use GNU awk rather than Busybox awk, one simply needs to load gawk and change the shebang from  #!/usr/bin/awk -f  to  #!/usr/local/bin/gawk -f

I don't have a lot of time to tinker today, so for the Busybox awk vs. GNU awk benchmarking I simply ran this command 10 times for each awk version:

Code: [Select]
$ time treegen.awk vlc-dev.tcz
then took the mean of the real time shown in the output. Here is the somewhat surprising result:

Busybox awk: 1.52 sec
GNU awk: 1.61 sec


I'm not sure whether the difference is statistically significant. I generally prefer to do as much as possible with just what's included in the base system, so I'm happy that Busybox is at least as fast as GNU awk if not slightly faster for this particular task!

P.S. treegen.awk is now in the public domain. It would be trivial to use a variable to count the number of times the get_dependencies function has been called, and bail out if that variable exceeds some appropriately large number. This could be used as a poor man's test for circular dependencies.
« Last Edit: April 07, 2026, 11:41:40 AM by GNUser »

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12731
Re: script that generates .tree files
« Reply #17 on: April 07, 2026, 02:55:29 PM »
Hi GNUser
I reread that thread I referenced and realized I misremembered
what I thought was a faster GNU awk. It was a much faster GNU
 sort  that I mentioned in that thread.

I think the minor speed difference between awk and gawk is just noise.

... count the number of times the get_dependencies function has been called, and bail out if that variable exceeds some appropriately large number. This could be used as a poor man's test for circular dependencies.
A better way is to build a flattened version as you go:
Code: [Select]
A.tcz,B.tcz,C.tcz,D.tcz,E.tczBefore appending the next tcz to the flattened list, check if it exists.
In this example, B.tcz creates a circular dependency so you stop:
Code: [Select]
A.tcz,B.tcz,C.tcz,D.tcz,E.tcz,B.tcz,
As you get to the end of each .dep file, you start unwinding the
flattened list by removing the last entry in the list.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1841
Re: script that generates .tree files
« Reply #18 on: April 08, 2026, 08:34:48 AM »
Up to version 1.2 of treegen.awk, the script tries to close every .dep file it tried to open, whether or not that .dep file existed.

This updated version only tries to close .dep files that were actually opened. It's not any faster, just more correct. I also moved LEVEL++ to a more human-friendly place.

Code: [Select]
#!/usr/bin/awk -f                                                                                                                     
                                                                                                                                       
# treegen.awk v2.0 (April 8, 2026)                                                                                                     
# usage example: $ treegen.awk labwc.tcz                                                                                               
                                                                                                                                       
BEGIN {                                                                                                                               
    LEVEL = 0                                                                                                                         
    LINUX_VERSION = "6.18.2-tinycore64"                                                                                               
    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"                                                                       
    get_dependencies(ARGV[1])                                                                                                         
}                                                                                                                                     
                                                                                                                                       
function get_dependencies(app,    depapp) {                                                                                           
    LEVEL++                                                                                                                           
    gsub(/KERNEL/,LINUX_VERSION,app)                                                                                                   
    for (i = 1; i < LEVEL; i++)                                                                                                       
        printf("   ")                                                                                                                 
    printf("%s\n", app)                                                                                                               
    if (getline depapp <(MIRROR_PATH app ".dep") > 0) {                                                                               
        do {                                                                                                                           
            if (depapp ~ /\.tcz/) # because some .dep files have blank lines                                                         
                get_dependencies(depapp)                                                                                           
        } while (getline depapp <(MIRROR_PATH app ".dep") > 0)                                                                         
        close(MIRROR_PATH app ".dep")                                                                                                 
    }                                                                                                                                 
    LEVEL--                                                                                                                           
}   

Sorry for spamming the forum with this, but now that it's out there I feel responsible for it being correct and beautiful :)

I think this is as good as I can make it. If anyone finds a bug or an optimization that I missed, please do share. Happy hacking!                             
« Last Edit: April 08, 2026, 08:40:42 AM by GNUser »

Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 839
Re: script that generates .tree files
« Reply #19 on: April 08, 2026, 10:07:08 AM »
You don't "spamming" the forum, You’re also teaching the community a lot.
You could also doing this into a new page in the wiki and link your thread to it.

Somebody may have a use for your script, also for learning the script capabilities with Linux.
You could also test this with another script languages like lua or ruby or even python.
But then you would also add that to the depending list  :( .

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12731
Re: script that generates .tree files
« Reply #20 on: April 23, 2026, 02:57:10 PM »
Hi GNUser
Here's final tally of busybox ash vs. busybox awk vs. C++ implementations of treegen. All three implementations parse .dep files present on the local mirror (without using network/wget):

                                      busybox ash     busybox awk          C++
time to generate labwc-dev.tcz.tree:    11 sec          0.43 sec         0.06 sec
time to generate vlc-dev.tcz.tree:      49 sec          1.86 sec         0.31 sec
...
I decided to try my hand at a busybox ash version.

Using your busybox awk as a benchmark, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 8.49s
user    0m 3.56s
sys     0m 4.92s

Using my busybox ash version, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./TreeGen.sh vlc-dev
6.25
real    0m 6.27s
user    0m 3.71s
sys     0m 2.52s

That surprised me, but not as much as this did:
Code: [Select]
tc@E310:~/TreeGen$ ls -l vlc*
-rw-r--r-- 1 tc staff 1879760 Apr 23 00:56 vlc-dev.tcz.tree <------ ash version
-rw-r--r-- 1 tc staff 1839796 Mar 30 13:48 vlc-dev.tcz.tree.bak <-- repo version
-rw-r--r-- 1 tc staff 1839795 Apr 22 22:53 vlc-dev.tcz.tree2 <----- awk version
The repo versio is 1 byte longer than yours. It has an extra newline at the end of the file.
But mine is almost 40,000 bytes larger.

And about 900 lines longer:
Code: [Select]
tc@E310:~/TreeGen$ wc -l vlc-dev.tcz.tree*
  41437 vlc-dev.tcz.tree <------ ash version
  40528 vlc-dev.tcz.tree.bak <-- repo version
  40527 vlc-dev.tcz.tree2 <----- awk version

At first I figured there was a bug in my version because yours agreed with the repo version.
Then I found this:

Repo version:
Code: [Select]
   3498                   libsndfile-dev.tcz
   3499                      libsndfile.tcz
   3500                         flac.tcz
   3501                            libogg.tcz
   3502                         libvorbis.tcz
   3503                            libogg.tcz
   3504                         opus.tcz
   3505                         libmpg123.tcz
   3506                         lame.tcz
   3507                      flac-dev.tcz
   3508                      libvorbis-dev.tcz   
   3509                   libbluetooth-dev.tcz

My version:
Code: [Select]
   3498                   libsndfile-dev.tcz
   3499                      libsndfile.tcz
   3500                         flac.tcz
   3501                            libogg.tcz
   3502                         libvorbis.tcz
   3503                            libogg.tcz
   3504                         opus.tcz
   3505                         libmpg123.tcz
   3506                         lame.tcz
   3507                      flac-dev.tcz
   3508                         flac.tcz
   3509                            libogg.tcz
   3510                         libogg-dev.tcz
   3511                            libogg.tcz
   3512                      libvorbis-dev.tcz
   3513                         libvorbis.tcz
   3514                            libogg.tcz
   3515                         libogg-dev.tcz
   3516                            libogg.tcz
   3517                   libbluetooth-dev.tcz

The C++ and awk versions are not parsing flac-dev.tcz.dep and libvorbis-dev.tcz.dep

Here's the cause:
Code: [Select]
tc@E310:~/TreeGen$ grep " " DepFiles/libsndfile-dev.tcz.dep | tr " " "_"
flac-dev.tcz_
libvorbis-dev.tcz___
The trailing spaces on those 2 entries are tripping up the C++ and awk programs.

Checking the rest of the .dep files related to vlc-dev turn up this:
Code: [Select]
tc@E310:~/TreeGen$ grep " " DepFiles/* | tr " " "_"
DepFiles/acl.tcz.dep:_
DepFiles/libsane.tcz.dep:libavahi.tcz_____
DepFiles/libsndfile-dev.tcz.dep:flac-dev.tcz_
DepFiles/libsndfile-dev.tcz.dep:libvorbis-dev.tcz___
DepFiles/libsndfile.tcz.dep:libmpg123.tcz_
DepFiles/sane-dev.tcz.dep:avahi-dev.tcz__
DepFiles/taglib-dev.tcz.dep:taglib.tcz___________________________________
There are a handful of entries with trailing spaces.

So running treegen.awk against libsndfile-dev.tcz:
Code: [Select]
tc@E310:~/TreeGen$ ./treegen.awk libsndfile-dev.tcz
libsndfile-dev.tcz
   libsndfile.tcz
      flac.tcz
         libogg.tcz
      libvorbis.tcz
         libogg.tcz
      opus.tcz
      libmpg123.tcz
      lame.tcz
   flac-dev.tcz
   libvorbis-dev.tcz
It reports flac-dev.tcz and libvorbis-dev.tcz have no dependencies.

Running these commands suggestes they do:
Code: [Select]
tc@E310:~/TreeGen$ ./treegen.awk flac-dev.tcz
flac-dev.tcz
   flac.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz
tc@E310:~/TreeGen$ ./treegen.awk libvorbis-dev.tcz
libvorbis-dev.tcz
   libvorbis.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz

I've attached my version of TreeGen.sh.

The only variables you need to adjust are kernel version and
path to your local repo.
Code: [Select]
Kernel="6.18.2-tinycore64"
Repo="DepFiles/"

Run the program like this:
Code: [Select]
./TreeGen.sh FilenameThe script adds .tcz automatically if you omit it.
The script automatically saves results to Filename.tcz.tree in
the current directory.

Offline gadget42

  • Hero Member
  • *****
  • Posts: 1034
** WARNING: connection is not using a post-quantum kex exchange algorithm.
** This session may be vulnerable to "store now, decrypt later" attacks.
** The server may need to be upgraded. See https://openssh.com/pq.html
** Also see: post quantum internet 2025 - https://blog.cloudflare.com/pq-2025/

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12731
Re: script that generates .tree files
« Reply #22 on: April 25, 2026, 03:59:52 PM »
Hi GNUser
... I think this is as good as I can make it. If anyone finds a bug ...
Well, I pointed out a bug in my last post (not processing filenames with trailing spaces).
I'm not well versed in awk, but I suspect it may be related to this line:
Code: [Select]
            if (depapp ~ /\.tcz/) # because some .dep files have blank lines
Quote
... or an optimization that I missed, please do share. ...
... I decided to try my hand at a busybox ash version.

Using your busybox awk as a benchmark, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 8.49s
user    0m 3.56s
sys     0m 4.92s

Using my busybox ash version, I got:
Code: [Select]
tc@E310:~/TreeGen$ time ./TreeGen.sh vlc-dev
6.25
real    0m 6.27s
user    0m 3.71s
sys     0m 2.52s
...

I did find a couple of optimizations.
Change this:
Code: [Select]
    for (i = 1; i < LEVEL; i++)
        printf("   ")
To this:
Code: [Select]
    Width=(LEVEL - 1) * 3
        printf("%"Width"s")
And the execution time dropped to:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.95s
user    0m 1.71s
sys     0m 1.22s

Taking it a step further, you can now combine the 2 consecutive printf statements.
Change this:
Code: [Select]
    Width=(LEVEL - 1) * 3
        printf("%"Width"s")
    printf("%s\n", app)
To this:
Code: [Select]
    Width=(LEVEL - 1) * 3
        printf("%"Width"s%s\n", "", app)
And you get a further reduction in execution time:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.44s
user    0m 1.45s
sys     0m 0.94s

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12731
Re: script that generates .tree files
« Reply #23 on: April 25, 2026, 08:22:09 PM »
Hi GNUser
Found one more minor optimization.
Changed  LEVEL++  to  LEVEL+=3  and moved it down after
the  printf  statement.
Changed  LEVEL--  to  LEVEL-=3.
Removed  Width=(LEVEL - 1) * 3  and used  LEVEL  in place of  Width.

The final implementation looks like this:
Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v2.0 (April 8, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    LEVEL = 0
    LINUX_VERSION = "6.18.2-tinycore64"
#    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    MIRROR_PATH = "/home/tc/TreeGen/DepFiles/"
    get_dependencies(ARGV[1])
}

function get_dependencies(app,    depapp) {
    gsub(/KERNEL/,LINUX_VERSION,app)
        printf("%"LEVEL"s%s\n", "", app)
    LEVEL+=3
    if (getline depapp <(MIRROR_PATH app ".dep") > 0) {
        do {
            if (depapp ~ /\.tcz/) # because some .dep files have blank lines
                get_dependencies(depapp)
        } while (getline depapp <(MIRROR_PATH app ".dep") > 0)
        close(MIRROR_PATH app ".dep")
    }
    LEVEL-=3
}

Execution appears to be a tiny bit faster:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.33s
user    0m 1.47s
sys     0m 0.84s

I think that may be about all I can squeeze out of treegen.awk.
I'll leave updating the version number up to you if you decide
to incorporate any of these changes.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1841
Re: script that generates .tree files
« Reply #24 on: May 01, 2026, 10:44:34 PM »
Hi Rich. Sorry, I did not notice these updates to the thread until now. The forum does not always send an email notification when there are new posts.

Thank you very much for finding this bug and the optimizations. I will post an updated version of treegen.awk on Sunday or Monday when I have some time to play with this.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1841
Re: script that generates .tree files
« Reply #25 on: May 02, 2026, 07:17:23 AM »
Hi, Rich. I made a few changes:

1. Renamed some variables to (hopefully) make things more clear (e.g., LEVEL renamed to N_SPACES--by LEVEL I had meant recursion level, which is no longer what this variable's value represents).

2. I added a call to gsub to get rid of any trailing whitespace

We will lose some speed with the call go gsub, but it will be small because gsub is internal to awk.

Version bump to 3.0 :)

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.0 (May 2, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    N_SPACES = 0
    LINUX_VERSION = "6.18.2-tinycore64"
#    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    MIRROR_PATH = "/home/tc/TreeGen/DepFiles/"
    get_dependencies(ARGV[1])
}

function get_dependencies(tczname,    line) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
        printf("%"N_SPACES"s%s\n", "", tczname)
    N_SPACES+=3
    if (getline line <(MIRROR_PATH tczname ".dep") > 0) {
        do {
            if (line ~ /\.tcz/) # because some .dep files have blank lines
                gsub(/\.tcz[ \t]+$/, ".tcz", line) # because some lines in .dep files have trailing whitespace
                get_dependencies(line)
        } while (getline line <(MIRROR_PATH tczname ".dep") > 0)
        close(MIRROR_PATH tczname ".dep")
    }
    N_SPACES-=3
}



« Last Edit: May 02, 2026, 07:43:06 AM by GNUser »

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12731
Re: script that generates .tree files
« Reply #26 on: May 02, 2026, 08:37:10 AM »
Hi GNUser
Well, you're almost there. :)
For testing, I saved your new version as treegen3.awk.

... We will lose some speed with the call go gsub, but it will be small because gsub is internal to awk. ...
No big speed penalty:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen2.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.35s
user    0m 1.45s
sys     0m 0.89s
tc@E310:~/TreeGen$ time ./treegen3.awk vlc-dev.tcz > vlc-dev.tcz.tree3
real    0m 2.50s
user    0m 1.56s
sys     0m 0.92s

However, your tree file is now about 8000 bytes bigger than mine:
Code: [Select]
tc@E310:~/TreeGen$ ls -l vlc*
-rw-r--r-- 1 tc staff 1879760 Apr 25 14:08 vlc-dev.tcz.tree <------ ash version
-rw-r--r-- 1 tc staff 1839796 Mar 30 13:48 vlc-dev.tcz.tree.bak <-- repo version
-rw-r--r-- 1 tc staff 1839795 May  2 07:35 vlc-dev.tcz.tree2
-rw-r--r-- 1 tc staff 1887792 May  2 07:35 vlc-dev.tcz.tree3 <----- awk version

And 258 lines longer:
Code: [Select]
tc@E310:~/TreeGen$ wc -l vlc-dev.tcz.tree*
  41437 vlc-dev.tcz.tree
  40528 vlc-dev.tcz.tree.bak
  40527 vlc-dev.tcz.tree2
  41695 vlc-dev.tcz.tree3
 164187 total
tc@E310:~/TreeGen$ calc 41695-41437
258

Searching for empty lines:
Code: [Select]
tc@E310:~/TreeGen$ busybox grep -c '^$' vlc-dev.tcz.tree3
0

Searching for lines containing only whitespace:
Code: [Select]
tc@E310:~/TreeGen$ busybox grep -c '^[[:blank:]]\+$' vlc-dev.tcz.tree3
258

Searching for lines containing only spaces:
Code: [Select]
tc@E310:~/TreeGen$ busybox grep -c '^[[:space:]]\+$' vlc-dev.tcz.tree3
258

So some of those "blank" lines contain a bunch of space characters.

My awk skills are weak at best, so I will offer a generic suggestion.
This is a command I use when I want to reduce a tree file to a
sorted list of unique file names:
Code: [Select]
awk '$1=$1' FileIn | sort -u > FileOutThe interesting part is  awk '$1=$1'  removes all whitespace.

If you can implement that on every line read, all you would
need to test for are lines with a length of zero characters.

File names can't contain whitespace, so they would not be touched.
Leading and trailing whitespace would be removed.
Lines containing only whitespace would become empty (zero length).

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1841
Re: script that generates .tree files
« Reply #27 on: May 02, 2026, 04:12:46 PM »
Hi Rich. Yes, I can reproduce the blank lines in the output of treegen.awk v3.0. I should have tested it more ::)
I'm working on a new version now.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1841
Re: script that generates .tree files
« Reply #28 on: May 02, 2026, 04:43:48 PM »
Hi Rich. My idea was correct, but the syntax was wrong (a set of braces were missing). Sorry about that. I think this is what we want:

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.1 (May 2, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    N_SPACES = 0
    LINUX_VERSION = "6.18.2-tinycore64"
#    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    MIRROR_PATH = "/home/tc/TreeGen/DepFiles/"
    get_dependencies(ARGV[1])
}

function get_dependencies(tczname,    line) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
        printf("%"N_SPACES"s%s\n", "", tczname)
    N_SPACES+=3
    if (getline line <(MIRROR_PATH tczname ".dep") > 0) {
        do {
            if (line ~ /\.tcz/) { # because some .dep files have blank lines
                gsub(/\.tcz[ \t]+$/, ".tcz", line) # because some lines in .dep files have trailing whitespace
                get_dependencies(line)
            }
        } while (getline line <(MIRROR_PATH tczname ".dep") > 0)
        close(MIRROR_PATH tczname ".dep")
    }
    N_SPACES-=3
}
Can you please test this?

P.S. The '$1=$1' trick is quite handy but I don't see an obvious way to integrate it with treegen.awk's existing logic.

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12731
Re: script that generates .tree files
« Reply #29 on: May 02, 2026, 10:19:30 PM »
Hi GNUser
I think we have a winner. ;D
I saved this version as treegen4.awk.

Sizes now match:
Code: [Select]
tc@E310:~/TreeGen$ ls -l vlc*
-rw-r--r-- 1 tc staff 1879760 Apr 25 14:08 vlc-dev.tcz.tree <------ ash version
-rw-r--r-- 1 tc staff 1839796 Mar 30 13:48 vlc-dev.tcz.tree.bak <-- repo version
-rw-r--r-- 1 tc staff 1839795 May  2 21:59 vlc-dev.tcz.tree2
-rw-r--r-- 1 tc staff 1887792 May  2 07:35 vlc-dev.tcz.tree3
-rw-r--r-- 1 tc staff 1879760 May  2 21:59 vlc-dev.tcz.tree4 <----- awk version

Number of lines now match:
Code: [Select]
tc@E310:~/TreeGen$ wc -l vlc-dev.tcz.tree*
  41437 vlc-dev.tcz.tree
  40528 vlc-dev.tcz.tree.bak
  40527 vlc-dev.tcz.tree2
  41695 vlc-dev.tcz.tree3
  41437 vlc-dev.tcz.tree4

Execution time still looks good:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen2.awk vlc-dev.tcz > vlc-dev.tcz.tree2
real    0m 2.35s
user    0m 1.54s
sys     0m 0.79s
tc@E310:~/TreeGen$ time ./treegen4.awk vlc-dev.tcz > vlc-dev.tcz.tree4
real    0m 2.47s
user    0m 1.59s
sys     0m 0.86s

Your original treegen.awk took 1.86 sec to create vlc-dev.tcz.tree vs my machines 8.49 sec.
So you are obviously running much faster hardware than I am.
Out of curiosity, how long does this latest version take on your hardware?

... P.S. The '$1=$1' trick is quite handy but I don't see an obvious way to integrate it with treegen.awk's existing logic.
Yeah, I tried also but got nowhere.