WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: (tc) url to local path awk/sh  (Read 2281 times)

Offline mocore

  • Hero Member
  • *****
  • Posts: 645
  • ~.~
(tc) url to local path awk/sh
« on: March 14, 2023, 04:42:22 PM »

 url to local path
with awk to shell

idk if its useful for much tbh  :-\

Code: [Select]



# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri  "$url"  "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
 url="${1-'http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz'}";
 shift
 echo "$url" | busybox awk -v path_prefix="$1" -F"/" '

function get_path(u    ,path ){
  colafter=5

  split(u,path,"/")
  path_length=length(path)
  p=""
  pc=0
  for(i in path) { 
   if(  i > colafter && i <= path_length  -1 ) {
    pc=(pc+1);
    if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
   }
  }
  return p ;
 }

function mkpathstr(u){ 
               protocall=$1"//"
               domain=$2
               path=get_path($0)
               remotepath=$0
               file=$NF
               ver=$4
               arch=$5
               localpath=path_prefix"/"ver"/"arch"/"path"/"file
             }

{
  mkpathstr($0)

  # return vars from awk
  print protocall,  remotepath , path , file , ver , arch ,localpath
 
}

' ;  }

 url="$1"
 pathprefix="$2"

# read vars into script
read protocall remotepath path file ver arch localpath  < <( tc_uri "${@}" )

echo -e "vars:\nprotocall : $protocall \nremotepath : $remotepath \npath : $path \nfile : $file  \nver  : $ver \narch : $arch \nlocalpath : $localpath "

# eg ?..
# mkdir -p $(dirname $localpath)
# wget -O$localpath $remotepath





Offline mocore

  • Hero Member
  • *****
  • Posts: 645
  • ~.~
Re: (tc) url to local path awk/sh
« Reply #1 on: March 18, 2023, 08:50:00 PM »
modified script
 - to use set to pass args from awk
 - remove the (bash) process substitution ::) ,
https://en.wikipedia.org/wiki/Process_substitution
https://stackoverflow.com/questions/30781969/difference-on-bash-and-ash-parentheses

..& found ash "process substitution" alternative using "file descriptors"  ???
https://stackoverflow.com/questions/30781969/difference-on-bash-and-ash-parentheses/69739256#69739256
https://unix.stackexchange.com/questions/309547/what-is-the-portable-posix-way-to-achieve-process-substitution/639752#639752


Code: [Select]

# test
# busybox ash -c '. ./tc-uri.sh ; echo "$# $@ >$path"'


protocall="";
remotepath="";
path="";
filename="";
ver="";
arch="";
localpath="";

# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri  "$url"  "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
 url="${1-"http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz"}";
 shift
 echo "$url"  | busybox awk -v path_prefix="$1" -F"/" '

function get_path(u    ,path ){
  colafter=5

  split(u,path,"/")
  path_length=length(path)
  p=""
  pc=0
  for(i in path) {
   if(  i > colafter && i <= path_length  -1 ) {
    pc=(pc+1);
    if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
   }
  }
  return p ;
 }

function mkpathstr(u){
               protocall=$1"//"
               domain=$2
               path=get_path($0)
               remotepath=$0
               file=$NF
               ver=$4
               arch=$5
               localpath=path_prefix"/"ver"/"arch"/"path"/"file
             }

{
  mkpathstr($0)

  # return vars from awk
  print protocall,  remotepath , path , file , ver , arch ,localpath
 
}

' ;  }

# echo $@

set $(tc_uri "${@}" ) ;

# echo $# $@
 

protocall="$1";
remotepath="$2";
path="$3";
filename="$4";
ver="$5";
arch="$6";
localpath="$7";

shift $#



« Last Edit: March 18, 2023, 08:53:39 PM by mocore »

Offline mocore

  • Hero Member
  • *****
  • Posts: 645
  • ~.~
Re: (tc) url to local path awk/sh [
« Reply #2 on: July 18, 2023, 02:12:44 AM »

updated awk function adding method to find column where the tc repo path starts
eg [repo path]: http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz
so the function *should* now work with any mirror url  8) *yay*

Code: [Select]
# test
# busybox ash -c '. ./tc-uri.sh ; echo "$# $@ >$path"'


protocall="";
remotepath="";
path="";
filename="";
ver="";
arch="";
localpath="";

# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri  "$url"  "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
 url="${1-"http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz"}";
 shift
 echo "$url"  | busybox awk -v path_prefix="$1" -F"/" '

function vercol(urlpath ,upl    ,numb) { for(numb=0;numb < upl;numb++){ if( urlpath[numb] ~ ".*\.x" ) return numb  } }

function get_path(u    ,path ){
  colafter=5

  split(u,path,"/")
  path_length=length(path)
  colafter=vercol( path,path_length)  # find position of version column in url
 #print "col",colafter
  p=""
  pc=0
  for(i in path) {
   if(  i > colafter && i <= path_length  -1 ) {
    pc=(pc+1);
    if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
   }
  }
  return p ;
 }

function mkpathstr(u){
               protocall=$1"//"
               domain=$2
               path=get_path($0)
               remotepath=$0
               file=$NF
               ver=$4
               arch=$5
               localpath=path_prefix"/"path"/"file
             }

{
  mkpathstr($0)

  # return vars from awk
  print protocall,  remotepath , path , file , ver , arch ,localpath

}

' ;  }

# echo $@

set $(tc_uri "${@}" ) ;

# echo $# $@


protocall="$1";
remotepath="$2";
path="$3";
filename="$4";
ver="$5";
arch="$6";
localpath="$7";

shift $#



---

why? ... well because i was'nt realy happy with the simple script i had used for
 scraping the .info ect data to build a table of  'what tcz is in arch for each release version'
which required downloading a copy of the of the each repo .info ect  from *some mirror*  to *some local path*
then runing the script on local files to build the table

now that https://forum.tinycorelinux.net/index.php/topic,25982.msg167935.html#msg167935 dep.db.gz ect has been added
it should be a little less fafing to create 'what tcz is in arch for each release version' ... at lest for newer reop's

 ;D


at some point  :P
 i hope to finish/update  the scripts in the linked post
Quote
get-repo-info.sh -  small script to download/collect each $arch/info.lst file
mk-repo-table.js - read $arch/info.lst data into json and create html table
to 'get/view what tcz is in *each* arch for each release version'
 both have been rewritten in awk!
http://forum.tinycorelinux.net/index.php/topic,18767.msg115000.html#msg115000

... in the mean time
a few other methods to create a 'nice' extension list can be found in "Programming & Scripting - Unofficial"

eg
http://forum.tinycorelinux.net/index.php/topic,22016.0.html (tabulate.sh) Enhancing the "Browse TCZs" Webpage with an Automated Script

http://forum.tinycorelinux.net/index.php?topic=20688.0 'UserScript' for  Tampermonkey aka "Extentions Repository Browser Userscript"

....

https://forum.tinycorelinux.net/index.php/topic,22843.msg143165.html#msg143165 - Re: could we have a web-site index.html of extensions for tc10 64 and 32 bits?
https://forum.tinycorelinux.net/index.php/topic,18767.0.html - get/view what tcz is in arch for each release version

blah blah  :P

finito la musica

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 11641
Re: (tc) url to local path awk/sh
« Reply #3 on: July 18, 2023, 12:15:20 PM »
Hi mocore
It sounds like the primary objective here is to break the URL up
into its basic parts.

A proper URL for retrieving Tinycore files consists of these parts:
Protocall://Domain/Version/Arch/Dir/Filename

Here is what we know about the URL:
The Protocall is optional, wget can function without it.
Filename is the last field.
Dir is the second to last field and always equal to tcz.
Arch is the third to last field.
Version is the fourth to last field.

Once you strip those 5 fields from the URL the Domain remains.
That's the wild card. The Domain can include multiple directories
containing slashes, numbers, dots, and x characters, for example:
Code: [Select]
URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"
The following script does just that. It strips the Protocall if it
exists, and then strips the 4 trailing fields leaving the Domain.
I decided to try this using just the ash built-in commands. Aside
from calling the  tr  command once, I succeeded:
Code: [Select]
#!/bin/sh

. /etc/init.d/tc-functions
useBusybox

# A valid URL will look as follows:
# (optional)Protocall://Domain which can contain slashes and numbers/Version/Arch/Dir (always tcz)/Filename
URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"

Remainder="$URL"

# Extract protocall if it exists.
case $Remainder in
*://*) # Test if $Remainder contains ://

# Extract everything preceding ://
Protocall="${Remainder%%://*}"
# Remove leading  $Protocall://  from Remainder.
Remainder="${Remainder#$Protocall://}"
;;
esac

# Convert any double, triple, etc slashes to single slashes.
Remainder=$(echo "$Remainder" | tr -s /)

# Really fast version of basename.
Filename="${Remainder##*/}"
# Remove trailing  /$Filename  from Remainder.
Remainder="${Remainder%/$Filename}"

# $Dir should always contain the string "tcz".
Dir="${Remainder##*/}"
Remainder="${Remainder%/$Dir}"

Arch="${Remainder##*/}"
Remainder="${Remainder%/$Arch}"

Versiom="${Remainder##*/}"
Domain="${Remainder%/$Versiom}"

# Print the original URL and the separated pieces.
echo "URL=$URL"
echo "Protocall=$Protocall"
echo "Domain=$Domain"
echo "Versiom=$Versiom"
echo "Arch=$Arch"
echo "Dir=$Dir"
echo "Filename=$Filename"

Result of running the file:
Code: [Select]
tc@E310:~/split$ ./SplitURL
URL=http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz
Protocall=http
Domain=SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore
Versiom=13.x
Arch=x86
Dir=tcz
Filename=info.lst.gz
tc@E310:~/split$

Offline mocore

  • Hero Member
  • *****
  • Posts: 645
  • ~.~
Re: (tc) url to local path awk/sh
« Reply #4 on: July 21, 2023, 08:40:17 AM »
Hi mocore
It sounds like the primary objective here is to break the URL up
into its basic parts.

hi rich

the goal i had in mind was to produce function to create a some sort of consistent maping (not shore this is the best term tbh)
between a(ny) tc mirror url and *some local dir*

afair because busybox wget do not support "-x, --force-directories; -nh, –no-host-directories  or –cut-dirs" ,
which is just manipulating simple strings of text !

Code: [Select]
# eg ?..
# mkdir -p $(dirname $localpath)
# wget -O$localpath $remotepath

inputs: mirror-url , local-dir-path

output local-dir-path/*relevent bits of*mirror-url

eg  ~in: http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz  /tmp
    ~out: /tmp/13.x/x86/tcz/info.lst.gz
  ~in:http://tinycorelinux.net/14.x/x86/tcz/src/openssh/compile_openssh /test
~out: /test/14.x/x86/tcz/src/openssh/compile_openssh

which could be used to replace an more ugly/unclear/casespesific section of mk-repo-table sh/awk function
... AND potentualy other scripts   

you picked a good example !!
Code: [Select]
URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"the  "TestVersion/3.x" would cause problem for the vercol function  ::)

using
Code: [Select]
tr -s "/"removes any potentual double forward slash amiguity from url's , and avoids ofseting counting due to protocall "://" adding empty column

Here is what we know about the URL:

this got me thinking , that initialy aproach this from the wrong angle

Code: [Select]
cat ./mirrors |tr -s "/" | awk  '{ uri=$0; split(uri,U,"/");ul=length(U);u="";for(i=3;i<ul;i++){ u=u"/"U[i]  } ; print ul,u }'
and started mulling over the above
 starting with what is known the mirror urls
to aquire the number of column's/slashes of server/path section to trim from the localpath
... then build up url from other known components  version / arch /  , ect ( afair my mk-repo-table creates url this way)


Dir is the second to last field and always equal to tcz.

when i initaly started this topic my intention was for as much generally as posible  (like wget options )
so that the function could have as few asumptions as posible about the url  :-\
eg : http://tinycorelinux.net/13.x/x86_64/release/distribution_files/modules64.gz

the cloice of awk over sh was because awk is easyer imho to read again and grok the intent

 :-\
« Last Edit: July 21, 2023, 08:47:04 AM by mocore »