WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: [Solved] script that generates .tree files  (Read 2343 times)

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1848
Re: script that generates .tree files
« Reply #30 on: May 03, 2026, 06:55:33 AM »
I think we have a winner. ;D
Hi Rich. Good to hear. I'll be using this version myself :)

Out of curiosity, how long does this latest version take on your hardware?
Here you go:

Code: [Select]
$ time treegen vlc-dev.tcz >/tmp/vlc-dev.tcz.tree
real 0m 0.67s
user 0m 0.40s
sys 0m 0.22s
So 1.86 sec before, 0.67 sec now. Your optimizations help a whole lot: The new version gets the job done in less than half the time, even with the call to gsub.

Thank you for your collaboration with this. Always a pleasure.

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12744
Re: script that generates .tree files
« Reply #31 on: May 03, 2026, 08:39:16 AM »
Hi GNUser
Beautiful.

Overall, I'd say your script stacks up quite well against the C++ version:
Code: [Select]
awk C++
     Size     ~900 bytes      ~25K bytes
Time to generate vlc-dev.tcz.tree     0.67 Secs.      0.31 Secs.

Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1552
Re: script that generates .tree files
« Reply #32 on: May 03, 2026, 09:55:18 AM »
For processing hundreds of tree files, I’ll take the speed of C++, but was there a robustness change needed for the repo, related to spaces and/or blank lines?

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1848
Re: script that generates .tree files
« Reply #33 on: May 03, 2026, 08:01:39 PM »
was there a robustness change needed for the repo, related to spaces and/or blank lines?
Hi Paul_123. Yes, trailing whitespace in .dep files is causing problems for the c++ version of treegen. See Rich's analysis in Reply #20.
« Last Edit: May 03, 2026, 08:05:56 PM by GNUser »

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12744
Re: script that generates .tree files
« Reply #34 on: May 03, 2026, 08:24:40 PM »
Hi Paul_123
The C++ version does not recurse on entries that contain spaces.
From the TC17 x86_64 repo.
This is libsndfile-dev.tcz.dep with spaces converted to underscores:
Code: [Select]
rich@tcbox:~/libsndfile$ cat libsndfile-dev.tcz.dep | tr " " "_"
libsndfile.tcz
flac-dev.tcz_
libvorbis-dev.tcz___
The two -dev entries have trailing spaces.

This is the output of treegen:
Code: [Select]
rich@tcbox:~/libsndfile$ treegen libsndfile-dev.tcz 6.18.2-tinycore64
libsndfile-dev.tcz
   libsndfile.tcz
      flac.tcz
         libogg.tcz
      libvorbis.tcz
         libogg.tcz
      opus.tcz
      libmpg123.tcz
      lame.tcz
   flac-dev.tcz
   libvorbis-dev.tcz
The two dev entries get listed but not their dependencies.

Running treegen on those two entries shows they have dependencies:
Code: [Select]
rich@tcbox:~/libsndfile$ treegen flac-dev.tcz 6.18.2-tinycore64
flac-dev.tcz
   flac.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz

rich@tcbox:~/libsndfile$ treegen libvorbis-dev.tcz 6.18.2-tinycore64
libvorbis-dev.tcz
   libvorbis.tcz
      libogg.tcz
   libogg-dev.tcz
      libogg.tcz

Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1552
Re: script that generates .tree files
« Reply #35 on: May 03, 2026, 10:51:30 PM »
That's simple enough to fix.  we should just ignore spaces.  Leading/Trailing/ or just a blank line. (stray tabs or /r as well)

Change the "nukenewline" function to.

Code: [Select]
static void nukenewline_space(char buf[]) {
        unsigned src, dst;

        for (src = 0, dst = 0; buf[src] != '\0'; src++) {
                if (buf[src] == '\n')
                        break;
                if (isspace((unsigned char)buf[src]))
                        continue;
                buf[dst++] = buf[src];
        }
        buf[dst] = '\0';
}

Need to add ctype.h as an include too.

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12744
Re: script that generates .tree files
« Reply #36 on: May 04, 2026, 11:42:46 AM »
Hi Paul_123
I think that looks OK.

If I'm reading it right:
1. You first test and break for a newline so that isspace can't remove it.
2. Then isspace is used to skip past any whitespace and increment the src index.
3. And the src index is copied to the dest index, then the dest index is incremented.
4. Repeat those steps until src index equals string terminator.
5. Write string terminator to dest index.

At that point I guess the calling function discards any strings containing only
a newline character.

Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1552
Re: script that generates .tree files
« Reply #37 on: May 04, 2026, 12:22:47 PM »
If I'm reading it right:

Yes, When the newline is encountered, that is the end of processing.....no need to waste time with anything else.   Only non whitespace characters are left in the string.   If there was no actual white space encountered, then it copies data from the source index to the destination index.


At that point I guess the calling function discards any strings containing only
a newline character.

Yes, the calling function right after processing the strings for newline (and now space)   There is a check to make sure the string length is not less than 4 (.tcz)   

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12744
Re: script that generates .tree files
« Reply #38 on: May 04, 2026, 03:10:19 PM »
Hi Paul_123
Sounds perfect. :)

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1848
Re: script that generates .tree files
« Reply #39 on: May 05, 2026, 06:55:29 AM »
Only non whitespace characters are left in the string.

Both Rich (with the '$1=$1' idea) and Paul_123 hint at a more general solution for whitespace. Noted.

I made a small change to treegen.awk: Rather than eliminating trailing whitespace, it now eliminates all whitespace.

The result: We only process lines that contain ".tcz", and only after we've stripped any whitespace.

Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.2 (May 5, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    N_SPACES = 0
    LINUX_VERSION = "6.18.2-tinycore64"
    MIRROR_PATH = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    get_dependencies(ARGV[1])
}

function get_dependencies(tczname,    line) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
        printf("%"N_SPACES"s%s\n", "", tczname)
    N_SPACES+=3
    if (getline line <(MIRROR_PATH tczname ".dep") > 0) {
        do {
            if (line ~ /\.tcz/) { # because some .dep files have blank lines
                gsub(/[ \t\r]+/, "", line) # because some lines in .dep files have whitespace
                get_dependencies(line)
            }
        } while (getline line <(MIRROR_PATH tczname ".dep") > 0)
        close(MIRROR_PATH tczname ".dep")
    }
    N_SPACES-=3
}

I did some tests and this version is actually slightly faster than treegen.awk 3.1. I'm on C++'s heels now ;D

Code: [Select]
awk C++
     Size     ~900 bytes      ~25K bytes
Time to generate vlc-dev.tcz.tree     0.45 Secs.      0.31 Secs.
« Last Edit: May 05, 2026, 07:00:09 AM by GNUser »

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12744
Re: script that generates .tree files
« Reply #40 on: May 05, 2026, 12:00:19 PM »
Hi Paul_123
I just ran your updated treegen in my test directory and
it produced the tree file correctly.

Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1552
Re: script that generates .tree files
« Reply #41 on: May 05, 2026, 01:22:05 PM »
Thanks for confirming.  I did some testing too.....it is live in the repo scripts.

Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1552
Re: script that generates .tree files
« Reply #42 on: May 05, 2026, 08:12:27 PM »
Okay, I had time to play.  Here is what AI thought of the code. There were a couple of iterations, but I just pasted it into one list.  Based on my test, it cut the time in half on your vlc-dev.tcz test case

Quote
1) Eliminates the N_SPACES += 3 / N_SPACES -= 3 mutations around every recursive call. Depth is now a pure value passed down the call stack — simpler and less error-prone.

2) %*s takes the width from the argument, avoiding the "%"N_SPACES"s" string concatenation on every call. Minor, but cleaner.

3) Use while (getline ...) (avoid the do/while pattern)
Your v2 reads the first line with if (getline ...) and then uses do { ... } while (getline ...), which does an extra getline control-flow dance. A single while loop is simpler and a bit faster.

4) Trim first, then test; avoid regex when possible
Right now you test if (line ~ /\.tcz/) then trim whitespace with a regex gsub. If most lines are blank/whitespace, trimming first lets you do a cheap suffix test.

Also: line ~ /\.tcz/ is “contains .tcz anywhere”. If dependency lines are supposed to end with .tcz, checking the suffix is both stricter and cheaper.

In awk you can do a suffix check without regex:

length(line) >= 4 && substr(line, length(line)-3) == ".tcz"

5) Replace dynamic-width format string building
printf("%"N_SPACES"s%s\n", "", tczname) rebuilds the format string each call. Prefer %*s:

printf("%*s%s\n", N_SPACES, "", tczname)

6) Minor: precompute the dep filename once per call
Don’t concatenate MIRROR_PATH tczname ".dep" multiple times.

7) Add caching, don't keep reading the same depfile over and over.



Code: [Select]
#!/usr/bin/awk -f

# treegen.awk v3.4 (May 5, 2026)
# usage example: $ treegen.awk labwc.tcz

BEGIN {
    LINUX_VERSION = "6.18.2-tinycore64"
    MIRROR_PATH   = "/mnt/usb/http/tinycorelinux/17.x/x86_64/tcz/"
    get_dependencies(ARGV[1], 0)
}

# Cache:
#   loaded[pkg]    = 1 once pkg.dep has been read (even if empty/missing)
#   ndeps[pkg]     = number of deps found
#   deps[pkg, i]   = ith dep (1..ndeps[pkg])

function load_depfile(pkg,    depfile, line, n) {
    if (loaded[pkg]) return
    loaded[pkg] = 1

    depfile = MIRROR_PATH pkg ".dep"
    n = 0

    while (getline line < depfile > 0) {
        gsub(/[ \t\r]+/, "", line)
        if (line == "") continue

        # endswith ".tcz" is faster/stricter than /\.tcz/
        L = length(line)
        if (L >= 4 && substr(line, L-3) == ".tcz")
            deps[pkg, ++n] = line
    }

    close(depfile)
    ndeps[pkg] = n
}

function get_dependencies(tczname, depth,    i, n) {
    gsub(/KERNEL/, LINUX_VERSION, tczname)
    printf("%*s%s\n", depth, "", tczname)

    load_depfile(tczname)

    n = ndeps[tczname]
    for (i = 1; i <= n; i++) {
        get_dependencies(deps[tczname, i], depth + 3)
    }
}

The Awk does not have a circular dependancy check

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12744
Re: script that generates .tree files
« Reply #43 on: May 05, 2026, 10:11:50 PM »
Hi Paul_123
Okay, I had time to play.  Here is what AI thought of the code. There were a couple of iterations, but I just pasted it into one list.  Based on my test, it cut the time in half on your vlc-dev.tcz test case

Quote
----- Snip -----
2) %*s takes the width from the argument, avoiding the "%"N_SPACES"s" string concatenation on every call. Minor, but cleaner.

 ----- Snip -----

5) Replace dynamic-width format string building
printf("%"N_SPACES"s%s\n", "", tczname) rebuilds the format string each call. Prefer %*s:

printf("%*s%s\n", N_SPACES, "", tczname)

 ----- Snip -----
...
The AI may prefer that syntax, but I used the other syntax for a reason.
Not all versions of awk understand  %*s , such as busybox awk in TC10 and TC14:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen6.awk vlc-dev.tcz > vlc-dev.tcz.tree6
awk: ./treegen6.awk:40: %*x formats are not supported
Command exited with non-zero status 1
After changing this:
Code: [Select]
printf("%*s%s\n", depth, "", tczname)To this:
Code: [Select]
printf("%"depth"s%s\n", "", tczname)It ran, and there was a very noticeable speed improvement.

Previous version:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen5.awk vlc-dev.tcz > vlc-dev.tcz.tree5
real    0m 2.47s
user    0m 1.47s
sys     0m 0.96s

New version:
Code: [Select]
tc@E310:~/TreeGen$ time ./treegen6.awk vlc-dev.tcz > vlc-dev.tcz.tree6
real    0m 1.43s
user    0m 0.95s
sys     0m 0.47s

Offline Paul_123

  • Administrator
  • Hero Member
  • *****
  • Posts: 1552
Re: script that generates .tree files
« Reply #44 on: May 05, 2026, 10:40:13 PM »
I saw the Dynamic format, I had missed you purposely removed it.  If I had more time, I would have tested each one at a time.  The caching route was probable the biggest improvement.

Man your computer is slow :)