Tiny Core Linux
Tiny Core Base => TCB Talk => Topic started by: nick65go on February 27, 2023, 04:46:32 AM
-
On any tc server, or its mirrors, we have the "core-index" files: tags.db.gz + provides.db.gz + sizelist.gz
These provide functionality to tce-ab and its app-GUI, for tcz management.
I propose to have also a file, named like dep.db.gz on tc server, consisting of concatenation of all *.tcz.dep
It is a simple task, very fast to do it; this task will not very often run on server, because very seldom a new tcz version will add a new dep. Also the size of this dep.db.gz file will be insignificant, so is not a burden for the server maintenance.
A new proposal could be also to enhance the app-GUI to do the same (which tcz ask for a given Y.tcz ?), similar to TAG selection), IF the dep.db.gz proposal is approved/ implemented.
Motivation:
In most linux distributions (Debian, Archlinux, Alpinelinux, just to name some of them) the user could browse the repository for any package/sub-package (these are like tcz extensions) and find out who is using a specific package.
sample: poppler.tcz is needed by whom, what tcz ask for it (will not run without it)?
according to this link
http://forum.tinycorelinux.net/index.php/topic,26121.msg167760.html#msg167760 (http://forum.tinycorelinux.net/index.php/topic,26121.msg167760.html#msg167760)
this task can NOT be done by a common user, because the user needs to download all *.tcz.dep from the server/mirrors, and this action is rude/impolite to server resources.
-
Just an example why will not be a burden on the server, using the venerable/stable TC13:
Index of /tinycorelinux/13.x/x86/tcz/: 15,668 files
Search ".tcz.tree" (1773 hits) size=22,411,412 Bytes
Search ".tcz.dep" (1773 hits) size= 668,732 Bytes
gzip compression for already existing provides.db (14,611,683 bytes) as provides.db.gz (1,487,225 bytes) means ratio 10.18%. the bigger the text files sum(*.tcz.tree) the better the compression, but let be conservative, same 11%. these means that
- a new tcz.tree.db.gz will be =~ 2,281,107 bytes [aka 2,227 KB, or 2,17 MB]
- a new tcz.dep.db.gz will be =~ 65,306 bytes [aka 63.77 KB, or 0.062 MB]
and so on for *.tcz.info.db.gz
-
OK. 14.x x86* now have this file. It uses the same format as provides.db, with the filename first and then deps if any.
For a "depends-on.sh" and corresponding Apps/tce-ab support, patches welcome.
-
@curaga: Thank you!
Good to see the inception of tcz.###.db.gz as accepted files on the server, for meta data query.
I will start to think/build/patch some scripts, with my rusty skills. All other/ experts script-writes are welcome to compete :) ; I am not afraid to get the bottom fame position.
-
Hi curaga
... For a "depends-on.sh" and corresponding Apps/tce-ab support, patches welcome.
I've attached a first cut of depends-on.sh.
-
Rich, that looks good. Please commit it to core-scripts.
-
Hi curaga
OK, I'm done fighting with github. The procedure I used in July of 2020 no
longer worked. When i went to push, I received a message that passwords
were no longer being accepted. So I created a "personal access token" and
tried again:
tc@E310:~/TC-Git$ git clone https://github.com/rarost/Core-scripts.git
Cloning into 'Core-scripts'...
remote: Enumerating objects: 1033, done.
remote: Counting objects: 100% (169/169), done.
remote: Compressing objects: 100% (93/93), done.
remote: Total 1033 (delta 81), reused 135 (delta 74), pack-reused 864
Receiving objects: 100% (1033/1033), 175.10 KiB | 0 bytes/s, done.
Resolving deltas: 100% (517/517), done.
Checking connectivity... done.
tc@E310:~/TC-Git$ cp depends-on.sh Core-scripts/usr/bin/
tc@E310:~/TC-Git$ cd Core-scripts/
tc@E310:~/TC-Git/Core-scripts$ git add -A
tc@E310:~/TC-Git/Core-scripts$ git commit -a
[master 02a01c0] Added depends-on.sh
1 file changed, 54 insertions(+)
create mode 100755 usr/bin/depends-on.sh
tc@E310:~/TC-Git/Core-scripts$ git push
Username for 'https://github.com': rarost
Password for 'https://rarost@github.com':
Counting objects: 5, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (5/5), done.
Writing objects: 100% (5/5), 1.08 KiB | 0 bytes/s, done.
Total 5 (delta 2), reused 0 (delta 0)
remote: Resolving deltas: 100% (2/2), completed with 2 local objects.
To https://github.com/rarost/Core-scripts.git
3384bfc..02a01c0 master -> master
tc@E310:~/TC-Git/Core-scripts$
I then went to https://github.com/rarost/Core-scripts
Clicked on Contribute -> Open pull request
Filled in a comment.
Clicked on Create pull request
-
Hi Rich. That's really nice, thank you for putting it together. I look forward to seeing the script in the base system!
I played around with the script and think I found three small bugs:
1. Running depends-on.sh gtk3
- Expected matches: Only extensions that contain gtk3.tcz in their .dep file
- Actual matches: Some unexpected hits such as blueman.tcz which contain gtk3-gir.tcz (but not gtk3.tcz) in their .dep file
2. Running depends-on.sh iwd or depends-on.sh iwd.tcz
- Expected matches: None, because no packages depend on iwd.tcz
- Actual matches: Extensions that depend on eiwd.tcz
3. Running any depends-on.sh foo or depends-on.sh foo.tcz
- Expected matches: Should not include foo.tcz because it seems silly to say that an extension depends on itself
- Actual matches always include foo.tcz
I attached a little patch that fixes these issues.
-
P.S. The patch assumes that dependencies listed in .dep files always include the trailing .tcz. Hopefully that's a safe assumption. If not, then the logic in the patch needs to be reworked.
-
Will this do what you are trying to do?
EXTN="${EXTN%.tcz}.tcz"
-
Rich, your pull request succeeded and I merged it.
About the fuzzy match: I kind of expect it to be fuzzy, if I can't be bothered to type the full name. This could be handled with an option, so that both a fuzzy and exact match are available. (which is the default and which is the option doesn't matter to me)
You can check for an option and then use "shift" if it was there, so the search param is $1 in both cases.
edit: dep files always include the .tcz.
-
Yes it does, Greg. Thank you. I like your more succinct way.
Hi, curaga. Good to know the dep files always include the .tcz. It simplifies the logic for an exact search.
I'm submitting a pull request now. Fuzzy search is default, -e turns will turn on exact search.
Edit: Rich, if you prefer exact search to be default, let me know and I can use -f for fuzzy and easily flip the logic.
-
Hi GNUser
EXTN="${EXTN%.tcz}.tcz"
That's exactly how I handled it in my FetchExt.sh script. :)
I'm guessing if someone is searching for which extensions depend on something, they
would probably want an exact match (noise free result) most of the time.
Once again, I'm guessing a search like this would usually be done to assess the impact
of doing something to/with an extension will have on other extensions or how an
installation behaves.
I can see a fuzzy search if you want to include legacy extensions like libpng12 vs libpng, or
poppler vs poppler07 for example. A fuzzy search would also include -bin, -dev, -gir, etc.
versions of these extensions.
For a fuzzy search you should probably strip the .tcz from the search term:
EXTN="${EXTN%.tcz}"
... Edit: Rich, if you prefer exact search to be default, let me know and I can use -f for fuzzy and easily flip the logic.
OK, short answer, I prefer exact for a default.
-
That makes sense, Rich. I'm on it.
-
Hi Rich. I created a pull request on GitHub to change depends-on.sh as you suggested.
I'm glad it's your preference to default to an exact search. I imagine that I'll be doing an exact search most of the time.
-
Hi GNUser
...
3. Running any depends-on.sh foo or depends-on.sh foo.tcz
- Expected matches: Should not include foo.tcz because it seems silly to say that an extension depends on itself
- Actual matches always include foo.tcz ...
The .db files contain Records like this:
aalib-dev.tcz
aalib.tcz
libX11-dev.tcz
A Record consists of Fields , 4 in this case. Field1 is the extension , Field2 and
Field3 are the dependencies, and Field4 is a blank line which marks the end of
a Record.
In awk, $0 is the entire Record. $1 is Field1, $2 is Field2, etc. The search is
performed on $0. For extensions which have no dependencies, $0 and $1 are
the same.
... I attached a little patch that fixes these issues.
awk 'BEGIN {FS="\n";RS=""} /\n'${TARGET_WITH_EXTENSION}'/{print $1}' "$TCEDIR"/"$DB" | grep -v "^${TARGET_WITH_EXTENSION}"
This will filter out avahi-ui-gtk3, gtk3-dev, gtk3-gir, vte-gtk3-dev, wxwidgetsgtk3-dev, etc
from the results when you do a fuzzy search for gtk3.
This should handle that correctly:
awk 'BEGIN {FS="\n";RS=""} /\n'${TARGET_WITH_EXTENSION}'/{if($1 != $0) print $1}' "$TCEDIR"/"$DB"
Nice trick placing \n in front of TARGET. I'll have to make a note of that. If I recall
correctly, the internet is littered with statements that line breaks are ignored when
doing a search of $0.
-
awk 'BEGIN {FS="\n";RS=""} /\n'${TARGET_WITH_EXTENSION}'/{print $1}' "$TCEDIR"/"$DB" | grep -v "^${TARGET_WITH_EXTENSION}"
This will filter out avahi-ui-gtk3, gtk3-dev, gtk3-gir, vte-gtk3-dev, wxwidgetsgtk3-dev, etc
from the results when you do a fuzzy search for gtk3.
Hi Rich. No worries, that line of code is only for exact searches.
I tested the current GitHub version of the script quite thoroughly. I think you'll find that it handles fuzzy and exact searches exactly as expected. Give it a try:
https://github.com/tinycorelinux/Core-scripts/blob/master/usr/bin/depends-on.sh
If you get any unexpected results with the current version at the link, I will buy you a beer.
-
Hi GNUser
Do a fuzzy search for acpid or acpid.tcz and one of the responses is acpid.tcz.
acpid.tcz has no dependencies, case 3 in your reply #7, so it should be filtered out.
Fixing it in the awk command lets it skip the print command and avoids calling grep.
Since you seem to have better access, could you fix my comment (line #35) in the script:
# This downloads a fresh copy of dep.dbgz if any of the following are true:
There's a period missing ( dep.db.gz ).
-
I thought case 3 in my reply #7 was permissible for fuzzy searches.
But I do find it annoying even for fuzzy searches.
I will try to fix case 3 for both exact and fuzzy searches.
I will fix line #35 for you at the same time.
-
Hi GNUser and Rich. May I have an additional proposal? This is to have a parameter of how far UP in hierarchy we want to go/see.
ex: who ask for libxcb.tcz? Few answers, one is libX11.tcz. OK, but what about libXrender.tcz (up one level) ?
libXrender.tcz
libX11.tcz
libxcb.tcz
libXau.tcz
libXdmcp.tcz
The main idea was to see if I replace/ destroy a tcz (because I miss its deps, and it can not load, etc), who is affected.
Maybe these observations will crystalize the awk logic.
-
Hi nick65go. The script is not that fancy. It only searches the .dep files, not the dependency tree.
-
In awk, $0 is the entire Record. $1 is Field1, $2 is Field2, etc. The search is
performed on $0. For extensions which have no dependencies, $0 and $1 are
the same.
I see. In that case, I think what we really want is to limit the search to fields $2 through last. I need to chew on this a bit.
-
One scenario: for me, it all started with the choose (for example) between gtk2 versus gtk3.
I wanted to know which apps need gtk2. Also I wanted to know which apps need gtk3.
Then I may choose the same apps which uses gtk2 (from a previous TC version) if it is the same version in gtk3.
So I need to evaluate what drags more tcz deps in RAM, just because gtk3, without any improvement /correction.
ex: If you look with apps-GUI it lists a lot of dep for a tiny tcz, and you think the RAM need is big, but is not because other tcz (Xorg, gtk3) were already in RAM.
So I compare wrong that xpopple-xpdf.tcz [164K] needs
Total size (bytes) 17895424, 17.07 MB
versus flaxpdf.tcz [216 KB] which needsTotal size (bytes) 17158144, 16.36 MB
is not 164KB vs. 216 KB [aka 52 KB gzip-ed] is more like 17.07 MB vs. 16.36 MB [aka 0.71 MB x 1024 = 724 KB];Above listed values are just for demo.
And here comes the option for a "level -deep", when further UP, all deps are common.
-
Hi GNUser
I've been down this road. $0 is the fastest way to search for something. To the
best of my knowledge, once you try breaking out individual fields, you need loop
counters and things slow way down.
If you want to do a search for extensions with no dependencies:
awk 'BEGIN {FS="\n";RS=""} NF==1 {print $1}' "$TCEDIR"/"$DB"
-
Do a fuzzy search for acpid or acpid.tcz and one of the responses is acpid.tcz.
acpid.tcz has no dependencies.
I think what you dislike about this response is not that acpid.tcz has no dependencies. It's that acpid.tcz is not one of acpid.tcz's dependencies, right?
Put it another way, if acpid.tcz did have some other random dependency, you would still not want it to be a response.
I'm not sure how to solve this other than limiting the search to the fields that actually represent dependencies (i.e., fields $2 through last). I think it may be best to just leave it alone--i.e., live with the fact that fuzzy search looks at entire record and may match on $1.
-
Hi GNUser
Changing the print command to this will filter out matches on extensions
with no dependencies:
{if($1 != $0) print $1}
-
Hi GNUser
This should work too and might be faster:
{if(NF > 1) print $1}
-
Hi Rich. Yes, either one of those would filter out extensions without dependencies. But it would still not solve many cases of the third bug I mentioned in Reply #7.
With either change you're proposing, if acpid.tcz depended on any random extension unrelated to your search (e.g., yad.tcz) and you did a fuzzy search for acpid or acpid.tcz , one of the responses would still be acpid.tcz.
IMHO either change would add overhead and only seem to work sometimes. I don't think there is an easy fix for that little bug in fuzzy searches (short of ensuring that the matched field is >= $2, which would probably involve looping through fields).
Edit: My vote is to leave the fuzzy search alone. But it's your child, so feel free to change it if you like.
-
Hi GNUser
... if acpid.tcz depended on any random extension unrelated to your search (e.g., yad.tcz) and you did a fuzzy search for acpid or acpid.tcz , one of the responses would still be acpid.tcz. ...
How could "acpid.tcz depended on any random extension" if it has no dependencies? ???
-
I made your change to the attached script. Try it.
$ ./depends-on.sh -f acpid
No responses, so fuzzy search seems to be matching only on dependencies.
$ ./depends-on.sh -f yad
wifi-manager.tcz
yad.tcz
Oops. yad.tcz does not actually depend on any extension containing yad in the name.
This is what I mean. Your change only fixes the unexpected results sometimes, so it's not really getting to the actual problem.
-
Hi GNUser
EXTN="${EXTN%.tcz}.tcz"
That's exactly how I handled it in my FetchExt.sh script. :)
That's probably where I got it from. ;D
When I was doing a web GUI interface for extensions in piCorePlayer, I eventually realised that ALL the heavy lifting had already been done in the TC core scripts! That's why I get a little nervous when I see lots of suggested changes to these scripts.
-
That's why I get a little nervous when I see lots of suggested changes to these scripts.
Haha. Me, too. Fortunately curaga has a discriminating eye.
I respect the TCL scripts and only dare to suggest minor edits.
-
On the attached script the indentation is using both spaces and tabs.
Is the capitalisation of the variable $Age OK?
-
Hi GNUser
... This is what I mean. Your change only fixes the unexpected results sometimes, so it's not really getting to the actual problem.
Yeah, OK, I see now what you mean. My fix filters out extensions without dependencies. Fuzzy searches
can match on any field and can be a substring of a field.
I dealt with a similar problem in my LddCheck.sh script:
https://forum.tinycorelinux.net/index.php/topic,26040.0.html
There I searched Records using $0 which is fast. Anytime there is a match, it searchs that Record a
second time from $2 to NF.
-
There I searched Records using $0 which is fast. Anytime there is a match, it searchs that Record a
second time from $2 to NF.
That's brilliant. Yes, this would solve the actual problem without an undue decrease in speed.
It will be a pleasure to think about this further and implement a solution based on this idea.
-
Hi Greg. Thanks for catching the clashing indentation styles. I fixed that. Age seems a bit strange but is a valid variable name in shell.
Hi Rich. I changed the search logic using your wonderful idea and example script. I did many test queries and am getting only expected responses now. Please give it a whirl:
https://github.com/tinycorelinux/Core-scripts/blob/master/usr/bin/depends-on.sh
-
Hi GNUser
... Age seems a bit strange but is a valid variable name in shell. ...
I'm checking how old the file is. Feel free to pick a more appropriate name. ;D
The link you provided appears to point to an older version without your changes.
-
Oops, link will only update if/when curaga merges the pull request. New version of the script is attached. I think the variable name is just fine ;D
-
Hi GNUser
Looks like it's working correctly. The speed held up well too. Timing
fuzzy searches for gtk2 or gtk3 show identical times to the old search.
Well done GNUser, thank you. :)
-
Hi Rich. You're welcome. The awk technique you shared is very interesting. Thank you for teaching me something new.
-
Hi GNUser
Glad you like it. I call it "Guess and confirm". It's kind of like speculative execution
or predictive branching but more accurate.
-
I see a possible issue with the fuzzy search: say acpid.tcz depended on libfoo and libfoo2 both. A fuzzy search for "libfoo" would print acpid twice, no? That is, there needs to be a break after the print so it's only printed once.
-
Hi curaga. I think you're right. I'll do some tests later.
Hopefully it's a simple matter of telling awk to continue to next record after it prints. I'll try to fix this.
-
You're right, curaga. For example, depends-on.sh -f gst-plugins causes totem.tcz to be printed three times.
A next statement in the right place fixes this. The GitHub pull request has been updated.
-
Merged your patches, thanks.
-
Great! Thank you.