WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: tce/app-browser , sparing of storage or network  (Read 14330 times)

Offline nick65go

  • Hero Member
  • *****
  • Posts: 839
Re: tce/app-browser , sparing of storage or network
« Reply #60 on: March 03, 2023, 06:24:56 PM »
Yes Rich. But the main point is that I prefer, if it is possible,to do not go online,just for searching / reading almost static info. Plus no wifi means more battery life, etc

Is like an html book which has chapters in separate html pages online, I prefer the full html documentation downloaded all once and read it in own time, even if / when internet is not available or expensive in that location. Anyway in the final I still need to download all pieces to read the full book, sometimes many times back and forth.

The point being that (even a not-malicious) foreign entity should not profile me, when I read, what I read, from which IP/location, how many time I read, how fast, etc.
If someone /A.I. is determined then it could in the end aggregate this info about me, but why should I do this easy for them.
« Last Edit: March 03, 2023, 06:38:08 PM by nick65go »

Offline nick65go

  • Hero Member
  • *****
  • Posts: 839
Re: tce/app-browser , sparing of storage or network
« Reply #61 on: March 07, 2023, 08:08:15 PM »
GNuser built an excellent script contributor.sh in a record time, by my measures.

Unfortunately in win10, using Qemu without acceleration, the original script took near 2m 48s (168 seconds), for "-t" option. I build my script which runs in  near 8 seconds. All measured are after the tbz was downloaded already, to measure apple for apple. How I tested / debugged:

Code: [Select]
time zcat "$TCE"/infofiles.tbz | grep -i ^extension > /tmp/a1.txt           # real 0m 8.21s
time cat /tmp/a1.txt | sed s/^Extension.*:\\W*// > /tmp/a2.txt              # real 0m 0.03s
#echo "remains only: 14[,] 13[aus9 at gmx dot com] 5{/] 1[(]"
cat a2.txt | sed s/,*// | sed s/aus9.*/aus9/ | sed sX\\/XX > /tmp/a3.txt     # real 0m 0.02s
time awk `{x[$1]++} END{ for (i in X) {print X[i], i}}` a3.txt | sort -r     # real 0m 0.06s

the final script is this:
Code: [Select]
zcat "$TCE"/infofiles.tbz | grep -i ^extension | sed s/^Extension.*:\\W*// | sed s/,.*// | sed s/aus9.*/aus9/ | awk `{x[$1]++} END{ for (i in X) {print X[i], i}}` | sort -r
One small issues: the final numbers are not digits, so the sort is "alphabetical".At least I tried my hand on sed, and awk and regex.

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1530
Re: tce/app-browser , sparing of storage or network
« Reply #62 on: March 07, 2023, 10:34:37 PM »
Hi nick65go. I had not encountered zcat before. Working with the .tbz archive would be a much more elegant solution than extracting the archive and working with 2000+ files. I'll try to refactor my script using zcat when I get a chance.

Your script is not working for me, unfortunately:
Code: [Select]
$ ./tally
./tally: line 3: {x[]++}: not found
BusyBox v1.36.0 (2023-01-17 09:43:30 UTC) multi-call binary.

Usage: awk [OPTIONS] [AWK_PROGRAM] [FILE]...

-v VAR=VAL Set variable
-F SEP Use SEP as field separator
-f FILE Read program from FILE

Does it require GNU awk? If so, I'll try to come up with a strategy that uses zcat and only what's available in base TCL.

« Last Edit: March 07, 2023, 10:37:02 PM by GNUser »

Offline nick65go

  • Hero Member
  • *****
  • Posts: 839
Re: tce/app-browser , sparing of storage or network
« Reply #63 on: March 08, 2023, 02:24:44 AM »
@GNUser: short answer, I think we can replace awk with:
Code: [Select]
cat /tmp/a3.txt | uniq -c | sort - rThe main gain was working only in RAM. Second gain came from pipes, then 3-rd to cleanup the strings "Extension* + not-Words" in front, and then not used strings ( , / ) at backed. I used awk just for sum of unique strings, so it can be replaced.
FYI: regarding your error, I think maybe a typo, your X[ ]++ should be corrected as X[$1]++ because in awk the $1 is the first field, filling of array X[] with indexes as string (not scalars), like X[aus9], X[gnuser] and count them in END block etc.
« Last Edit: March 08, 2023, 02:29:35 AM by nick65go »

Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1530
Re: tce/app-browser , sparing of storage or network
« Reply #64 on: March 08, 2023, 09:54:07 AM »
Hi nick65go. Your zcat and uniq -c ideas translate into a 91.5% improvement in contributor.sh -t speed on my machine: 0.58 seconds now vs. 6.68 seconds before. This is the elegant solution I was looking for but couldn't find. Well done.

I revised the script to include your ideas and uploaded it to the usual location.

I'd like to achieve similar improvement in speed with contributor.sh -ts but it would be much more tricky. See what you can come up with :)




Offline GNUser

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 1530
Re: tce/app-browser , sparing of storage or network
« Reply #65 on: March 08, 2023, 10:16:08 AM »
I'd like to achieve similar improvement in speed with contributor.sh -ts but it would be much more tricky. See what you can come up with :)
If we recreate the .tbz file with only the .info files we're interested in, then it's simple. I implemented this. Version bump to 5.1. Many thanks for your ideas. The gain in efficiency is dramatic.

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11702
Re: tce/app-browser , sparing of storage or network
« Reply #66 on: March 08, 2023, 10:36:15 AM »
Hi nick65go
@GNUser: short answer, I think we can replace awk with:
Code: [Select]
cat /tmp/a3.txt | uniq -c | sort - r ...

You may want to take the advice from the GNU uniq --help message:
Quote
... Note: 'uniq' does not detect repeated lines unless they are adjacent.
You may want to sort the input first, or use 'sort -u' without 'uniq'.

Also, comparisons honor the rules specified by 'LC_COLLATE'. ...

If you are looking for a count, piping a list through  wc -l  will give that to you:
Code: [Select]
tc@E310:~$ cat /etc/init.d/tc-config | wc -l
631
tc@E310:~$

Offline nick65go

  • Hero Member
  • *****
  • Posts: 839
Re: tce/app-browser , sparing of storage or network
« Reply #67 on: March 08, 2023, 11:40:50 AM »
GNUser and Rich, thank you for your feed-back and lessons. I just propose ideas, and I am happy to let the proficient persons to implement them for TC community, if it is suitable and not burden for them  :)

The stake is low in this game here, it is mostly a brain challenge for elegance merely, in tiny(core) spirit.
 
« Last Edit: March 08, 2023, 11:46:44 AM by nick65go »

Offline mocore

  • Hero Member
  • *****
  • Posts: 663
  • ~.~
Re: tce/app-browser , sparing of storage or network
« Reply #68 on: March 21, 2023, 08:07:22 PM »
i wander if anyone else has given this any thought ( dont remember finding any post with similar gist in the past )

correction ftr
 @ Topic: tgrex.pl - tcz/scm full text info search and download tool
https://forum.tinycorelinux.net/index.php/topic,14237.msg80232.html#msg80232 -

Quote
tgre update - downloads all the .info files
Yikes, that's hitting the mirror hard :P
all .info files  compressed
  gz or even xz
seem to me at least use full to have locally
...
+ it could make app browser and ab faster to browse .info for each app
   if tcz-info.xz was downloaded once
   and the files viewed from the archive localy
   rather than downloaded individual while browsing through the apps
...
 .. tho i geus that having a local mirror of the repo is the alternative option

...
Info and tree combined files would have less of use, IMHO.

@curaga
 i wander for what reason info & tree combined would be considered less use ?  ???

at least one compressed file would be fewer connections than
all though perhaps *now* (for the server) that's less of a concern?

Now on general subject in the title of post: "tce/app-browser , sparing of storage or network ".
With the addition of dep.db.gz (thank you curaga)
+1

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11049
Re: tce/app-browser , sparing of storage or network
« Reply #69 on: March 22, 2023, 02:56:51 AM »
The typical session may access just a couple infos and trees. People wanting the full set (for various analyses) are going to be quite rare.
The only barriers that can stop you are the ones you create yourself.

Offline mocore

  • Hero Member
  • *****
  • Posts: 663
  • ~.~
Re: tce/app-browser , sparing of storage or network
« Reply #70 on: August 17, 2024, 02:34:00 PM »
The typical session may access just a couple infos and trees. People wanting the full set (for various analyses) are going to be quite rare.

considering the above and also the conflicting aspect mentioned @ "Proposed improvement of repo and related tools" https://forum.tinycorelinux.net/index.php/topic,4786.msg25203.html#msg25203
 ( all though this thread and others seam to have move forward the core idea ;P of indexing this and that metadata  ) 

my thinking  is a compromise would be a script to create an extension containing "the full set" of repo meta data
for the rare ones!
out there ... leaching the repo's

I guess one of the main reasons for creating my own private TCZ directory mirror was that I wanted to know the content of all the extensions (and not only via the appbrowser on a per session basis).

... i guess the option of rsync
[1] https://unix.stackexchange.com/questions/366583/can-rsync-update-a-large-file-that-has-only-changed-partially-without-full-retra/366607#366607
appears to reduce bandwidth by default when changes to existing copy's of local files are downloaded