Author Topic: tce/app-browser , sparing of storage or network (Read 25607 times)

nick65go · « **Reply #60 on:** March 03, 2023, 06:24:56 PM »

Yes Rich. But the main point is that I prefer, if it is possible,to do not go online,just for searching / reading almost static info. Plus no wifi means more battery life, etc

Is like an html book which has chapters in separate html pages online, I prefer the full html documentation downloaded all once and read it in own time, even if / when internet is not available or expensive in that location. Anyway in the final I still need to download all pieces to read the full book, sometimes many times back and forth.

The point being that (even a not-malicious) foreign entity should not profile me, when I read, what I read, from which IP/location, how many time I read, how fast, etc.
If someone /A.I. is determined then it could in the end aggregate this info about me, but why should I do this easy for them.

nick65go · « **Reply #61 on:** March 07, 2023, 08:08:15 PM »

GNuser built an excellent script contributor.sh in a record time, by my measures.

Unfortunately in win10, using Qemu without acceleration, the original script took near 2m 48s (168 seconds), for "-t" option. I build my script which runs in near 8 seconds. All measured are after the tbz was downloaded already, to measure apple for apple. How I tested / debugged:

Code: [Select]

time zcat "$TCE"/infofiles.tbz | grep -i ^extension > /tmp/a1.txt           # real 0m 8.21s
time cat /tmp/a1.txt | sed s/^Extension.*:\\W*// > /tmp/a2.txt              # real 0m 0.03s
#echo "remains only: 14[,] 13[aus9 at gmx dot com] 5{/] 1[(]" 
cat a2.txt | sed s/,*// | sed s/aus9.*/aus9/ | sed sX\\/XX > /tmp/a3.txt     # real 0m 0.02s
time awk `{x[$1]++} END{ for (i in X) {print X[i], i}}` a3.txt | sort -r     # real 0m 0.06s

the final script is this:

Code: [Select]

zcat "$TCE"/infofiles.tbz | grep -i ^extension | sed s/^Extension.*:\\W*// | sed s/,.*// | sed s/aus9.*/aus9/ | awk `{x[$1]++} END{ for (i in X) {print X[i], i}}` | sort -r

One small issues: the final numbers are not digits, so the sort is "alphabetical".At least I tried my hand on sed, and awk and regex.

GNUser · « **Reply #62 on:** March 07, 2023, 10:34:37 PM »

Hi nick65go. I had not encountered zcat before. Working with the .tbz archive would be a much more elegant solution than extracting the archive and working with 2000+ files. I'll try to refactor my script using zcat when I get a chance.

Your script is not working for me, unfortunately:

Code: [Select]

$ ./tally
./tally: line 3: {x[]++}: not found
BusyBox v1.36.0 (2023-01-17 09:43:30 UTC) multi-call binary.

Usage: awk [OPTIONS] [AWK_PROGRAM] [FILE]...

	-v VAR=VAL	Set variable
	-F SEP		Use SEP as field separator
	-f FILE		Read program from FILE

Does it require GNU awk? If so, I'll try to come up with a strategy that uses zcat and only what's available in base TCL.

nick65go · « **Reply #63 on:** March 08, 2023, 02:24:44 AM »

@GNUser: short answer, I think we can replace awk with:

Code: [Select]

cat /tmp/a3.txt | uniq -c | sort - rThe main gain was working only in RAM. Second gain came from pipes, then 3-rd to cleanup the strings "Extension* + not-Words" in front, and then not used strings ( , / ) at backed. I used awk just for sum of unique strings, so it can be replaced.
FYI: regarding your error, I think maybe a typo, your X[ ]++ should be corrected as X[$1]++ because in awk the $1 is the first field, filling of array X[] with indexes as string (not scalars), like X[aus9], X[gnuser] and count them in END block etc.

GNUser · « **Reply #64 on:** March 08, 2023, 09:54:07 AM »

Hi nick65go. Your zcat and uniq -c ideas translate into a 91.5% improvement in contributor.sh -t speed on my machine: 0.58 seconds now vs. 6.68 seconds before. This is the elegant solution I was looking for but couldn't find. Well done.

I revised the script to include your ideas and uploaded it to the usual location.

I'd like to achieve similar improvement in speed with contributor.sh -ts but it would be much more tricky. See what you can come up with

GNUser · « **Reply #65 on:** March 08, 2023, 10:16:08 AM »

Quote from: GNUser on March 08, 2023, 09:54:07 AM

I'd like to achieve similar improvement in speed with contributor.sh -ts but it would be much more tricky. See what you can come up with

If we recreate the .tbz file with only the .info files we're interested in, then it's simple. I implemented this. Version bump to 5.1. Many thanks for your ideas. The gain in efficiency is dramatic.

Rich · « **Reply #66 on:** March 08, 2023, 10:36:15 AM »

Hi nick65go

Quote from: nick65go on March 08, 2023, 02:24:44 AM

@GNUser: short answer, I think we can replace awk with:
Code: [Select]
cat /tmp/a3.txt | uniq -c | sort - r ...

You may want to take the advice from the GNU uniq --help message:

Quote

... Note: 'uniq' does not detect repeated lines unless they are adjacent.
You may want to sort the input first, or use 'sort -u' without 'uniq'.
Also, comparisons honor the rules specified by 'LC_COLLATE'. ...

If you are looking for a count, piping a list through wc -l will give that to you:

Code: [Select]

tc@E310:~$ cat /etc/init.d/tc-config | wc -l
631
tc@E310:~$

nick65go · « **Reply #67 on:** March 08, 2023, 11:40:50 AM »

GNUser and Rich, thank you for your feed-back and lessons. I just propose ideas, and I am happy to let the proficient persons to implement them for TC community, if it is suitable and not burden for them

The stake is low in this game here, it is mostly a brain challenge for elegance merely, in tiny(core) spirit.

mocore · « **Reply #68 on:** March 21, 2023, 08:07:22 PM »

Quote from: mocore on October 21, 2022, 04:26:40 PM

i wander if anyone else has given this any thought ( dont remember finding any post with similar gist in the past )

correction ftr
@ Topic: tgrex.pl - tcz/scm full text info search and download tool
https://forum.tinycorelinux.net/index.php/topic,14237.msg80232.html#msg80232 -

Quote from: mocore on October 21, 2012, 07:59:50 PM

Quote from: curaga on October 21, 2012, 11:24:57 AM
Quote
tgre update - downloads all the .info files
Yikes, that's hitting the mirror hard
all .info files compressed
gz or even xz
seem to me at least use full to have locally
...
+ it could make app browser and ab faster to browse .info for each app
if tcz-info.xz was downloaded once
and the files viewed from the archive localy
rather than downloaded individual while browsing through the apps
...
.. tho i geus that having a local mirror of the repo is the alternative option

...

Quote from: curaga on February 28, 2023, 02:17:45 AM

Info and tree combined files would have less of use, IMHO.

@curaga
i wander for what reason info & tree combined would be considered less use ?

at least one compressed file would be fewer connections than
all though perhaps *now* (for the server) that's less of a concern?

Quote from: nick65go on March 03, 2023, 10:52:25 AM

Now on general subject in the title of post: "tce/app-browser , sparing of storage or network ".
With the addition of dep.db.gz (thank you curaga)

+1

curaga · « **Reply #69 on:** March 22, 2023, 02:56:51 AM »

The typical session may access just a couple infos and trees. People wanting the full set (for various analyses) are going to be quite rare.

mocore · « **Reply #70 on:** August 17, 2024, 02:34:00 PM »

Quote from: curaga on March 22, 2023, 02:56:51 AM

The typical session may access just a couple infos and trees. People wanting the full set (for various analyses) are going to be quite rare.

considering the above and also the conflicting aspect mentioned @ "Proposed improvement of repo and related tools" https://forum.tinycorelinux.net/index.php/topic,4786.msg25203.html#msg25203
( all though this thread and others seam to have move forward the core idea ;P of indexing this and that metadata )

my thinking is a compromise would be a script to create an extension containing "the full set" of repo meta data
for the rare ones!
out there ... leaching the repo's

Quote from: maro on January 25, 2010, 09:32:36 PM

I guess one of the main reasons for creating my own private TCZ directory mirror was that I wanted to know the content of all the extensions (and not only via the appbrowser on a per session basis).

... ~~i guess~~ the option of rsync
[1] https://unix.stackexchange.com/questions/366583/can-rsync-update-a-large-file-that-has-only-changed-partially-without-full-retra/366607#366607
appears to reduce bandwidth by default when changes to existing copy's of local files are downloaded

Tiny Core Linux

News:

Author Topic: tce/app-browser , sparing of storage or network (Read 25607 times)

nick65go

Re: tce/app-browser , sparing of storage or network

nick65go

Re: tce/app-browser , sparing of storage or network

GNUser

Re: tce/app-browser , sparing of storage or network

nick65go

Re: tce/app-browser , sparing of storage or network

GNUser

Re: tce/app-browser , sparing of storage or network

GNUser

Re: tce/app-browser , sparing of storage or network

Rich

Re: tce/app-browser , sparing of storage or network

nick65go

Re: tce/app-browser , sparing of storage or network

mocore

Re: tce/app-browser , sparing of storage or network

curaga

Re: tce/app-browser , sparing of storage or network

mocore

Re: tce/app-browser , sparing of storage or network