General TC > Programming & Scripting - Unofficial

(tc) url to local path awk/sh

(1/1)

mocore:

 url to local path
with awk to shell

idk if its useful for much tbh  :-\


--- Code: ---


# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri  "$url"  "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
 url="${1-'http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz'}";
 shift
 echo "$url" | busybox awk -v path_prefix="$1" -F"/" '

function get_path(u    ,path ){
  colafter=5

  split(u,path,"/")
  path_length=length(path)
  p=""
  pc=0
  for(i in path) { 
   if(  i > colafter && i <= path_length  -1 ) {
    pc=(pc+1);
    if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
   }
  }
  return p ;
 }

function mkpathstr(u){ 
               protocall=$1"//"
               domain=$2
               path=get_path($0)
               remotepath=$0
               file=$NF
               ver=$4
               arch=$5
               localpath=path_prefix"/"ver"/"arch"/"path"/"file
             }

{
  mkpathstr($0)

  # return vars from awk
  print protocall,  remotepath , path , file , ver , arch ,localpath
 
}

' ;  }

 url="$1"
 pathprefix="$2"

# read vars into script
read protocall remotepath path file ver arch localpath  < <( tc_uri "${@}" )

echo -e "vars:\nprotocall : $protocall \nremotepath : $remotepath \npath : $path \nfile : $file  \nver  : $ver \narch : $arch \nlocalpath : $localpath "

# eg ?..
# mkdir -p $(dirname $localpath)
# wget -O$localpath $remotepath



--- End code ---


mocore:
modified script
 - to use set to pass args from awk
 - remove the (bash) process substitution ::) ,
https://en.wikipedia.org/wiki/Process_substitution
https://stackoverflow.com/questions/30781969/difference-on-bash-and-ash-parentheses

..& found ash "process substitution" alternative using "file descriptors"  ???
https://stackoverflow.com/questions/30781969/difference-on-bash-and-ash-parentheses/69739256#69739256
https://unix.stackexchange.com/questions/309547/what-is-the-portable-posix-way-to-achieve-process-substitution/639752#639752



--- Code: ---
# test
# busybox ash -c '. ./tc-uri.sh ; echo "$# $@ >$path"'


protocall="";
remotepath="";
path="";
filename="";
ver="";
arch="";
localpath="";

# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri  "$url"  "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
 url="${1-"http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz"}";
 shift
 echo "$url"  | busybox awk -v path_prefix="$1" -F"/" '

function get_path(u    ,path ){
  colafter=5

  split(u,path,"/")
  path_length=length(path)
  p=""
  pc=0
  for(i in path) {
   if(  i > colafter && i <= path_length  -1 ) {
    pc=(pc+1);
    if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
   }
  }
  return p ;
 }

function mkpathstr(u){
               protocall=$1"//"
               domain=$2
               path=get_path($0)
               remotepath=$0
               file=$NF
               ver=$4
               arch=$5
               localpath=path_prefix"/"ver"/"arch"/"path"/"file
             }

{
  mkpathstr($0)

  # return vars from awk
  print protocall,  remotepath , path , file , ver , arch ,localpath
 
}

' ;  }

# echo $@

set $(tc_uri "${@}" ) ;

# echo $# $@
 

protocall="$1";
remotepath="$2";
path="$3";
filename="$4";
ver="$5";
arch="$6";
localpath="$7";

shift $#


--- End code ---


mocore:

updated awk function adding method to find column where the tc repo path starts
eg [repo path]: http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz
so the function *should* now work with any mirror url  8) *yay*


--- Code: ---# test
# busybox ash -c '. ./tc-uri.sh ; echo "$# $@ >$path"'


protocall="";
remotepath="";
path="";
filename="";
ver="";
arch="";
localpath="";

# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri  "$url"  "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
 url="${1-"http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz"}";
 shift
 echo "$url"  | busybox awk -v path_prefix="$1" -F"/" '

function vercol(urlpath ,upl    ,numb) { for(numb=0;numb < upl;numb++){ if( urlpath[numb] ~ ".*\.x" ) return numb  } }

function get_path(u    ,path ){
  colafter=5

  split(u,path,"/")
  path_length=length(path)
  colafter=vercol( path,path_length)  # find position of version column in url
 #print "col",colafter
  p=""
  pc=0
  for(i in path) {
   if(  i > colafter && i <= path_length  -1 ) {
    pc=(pc+1);
    if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
   }
  }
  return p ;
 }

function mkpathstr(u){
               protocall=$1"//"
               domain=$2
               path=get_path($0)
               remotepath=$0
               file=$NF
               ver=$4
               arch=$5
               localpath=path_prefix"/"path"/"file
             }

{
  mkpathstr($0)

  # return vars from awk
  print protocall,  remotepath , path , file , ver , arch ,localpath

}

' ;  }

# echo $@

set $(tc_uri "${@}" ) ;

# echo $# $@


protocall="$1";
remotepath="$2";
path="$3";
filename="$4";
ver="$5";
arch="$6";
localpath="$7";

shift $#



--- End code ---

---

why? ... well because i was'nt realy happy with the simple script i had used for
 scraping the .info ect data to build a table of  'what tcz is in arch for each release version'
which required downloading a copy of the of the each repo .info ect  from *some mirror*  to *some local path*
then runing the script on local files to build the table

now that https://forum.tinycorelinux.net/index.php/topic,25982.msg167935.html#msg167935 dep.db.gz ect has been added
it should be a little less fafing to create 'what tcz is in arch for each release version' ... at lest for newer reop's

 ;D


--- Quote from: mocore on June 04, 2019, 09:23:05 AM ---
at some point  :P
 i hope to finish/update  the scripts in the linked post

--- Quote ---get-repo-info.sh -  small script to download/collect each $arch/info.lst file
mk-repo-table.js - read $arch/info.lst data into json and create html table
--- End quote ---
to 'get/view what tcz is in *each* arch for each release version'
 both have been rewritten in awk!
http://forum.tinycorelinux.net/index.php/topic,18767.msg115000.html#msg115000

... in the mean time
a few other methods to create a 'nice' extension list can be found in "Programming & Scripting - Unofficial"

eg
http://forum.tinycorelinux.net/index.php/topic,22016.0.html (tabulate.sh) Enhancing the "Browse TCZs" Webpage with an Automated Script

http://forum.tinycorelinux.net/index.php?topic=20688.0 'UserScript' for  Tampermonkey aka "Extentions Repository Browser Userscript"

....

--- End quote ---

https://forum.tinycorelinux.net/index.php/topic,22843.msg143165.html#msg143165 - Re: could we have a web-site index.html of extensions for tc10 64 and 32 bits?
https://forum.tinycorelinux.net/index.php/topic,18767.0.html - get/view what tcz is in arch for each release version

blah blah  :P

finito la musica

Rich:
Hi mocore
It sounds like the primary objective here is to break the URL up
into its basic parts.

A proper URL for retrieving Tinycore files consists of these parts:
Protocall://Domain/Version/Arch/Dir/Filename

Here is what we know about the URL:
The Protocall is optional, wget can function without it.
Filename is the last field.
Dir is the second to last field and always equal to tcz.
Arch is the third to last field.
Version is the fourth to last field.

Once you strip those 5 fields from the URL the Domain remains.
That's the wild card. The Domain can include multiple directories
containing slashes, numbers, dots, and x characters, for example:

--- Code: ---URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"
--- End code ---

The following script does just that. It strips the Protocall if it
exists, and then strips the 4 trailing fields leaving the Domain.
I decided to try this using just the ash built-in commands. Aside
from calling the  tr  command once, I succeeded:

--- Code: ---#!/bin/sh

. /etc/init.d/tc-functions
useBusybox

# A valid URL will look as follows:
# (optional)Protocall://Domain which can contain slashes and numbers/Version/Arch/Dir (always tcz)/Filename
URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"

Remainder="$URL"

# Extract protocall if it exists.
case $Remainder in
*://*) # Test if $Remainder contains ://

# Extract everything preceding ://
Protocall="${Remainder%%://*}"
# Remove leading  $Protocall://  from Remainder.
Remainder="${Remainder#$Protocall://}"
;;
esac

# Convert any double, triple, etc slashes to single slashes.
Remainder=$(echo "$Remainder" | tr -s /)

# Really fast version of basename.
Filename="${Remainder##*/}"
# Remove trailing  /$Filename  from Remainder.
Remainder="${Remainder%/$Filename}"

# $Dir should always contain the string "tcz".
Dir="${Remainder##*/}"
Remainder="${Remainder%/$Dir}"

Arch="${Remainder##*/}"
Remainder="${Remainder%/$Arch}"

Versiom="${Remainder##*/}"
Domain="${Remainder%/$Versiom}"

# Print the original URL and the separated pieces.
echo "URL=$URL"
echo "Protocall=$Protocall"
echo "Domain=$Domain"
echo "Versiom=$Versiom"
echo "Arch=$Arch"
echo "Dir=$Dir"
echo "Filename=$Filename"
--- End code ---

Result of running the file:

--- Code: ---tc@E310:~/split$ ./SplitURL
URL=http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz
Protocall=http
Domain=SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore
Versiom=13.x
Arch=x86
Dir=tcz
Filename=info.lst.gz
tc@E310:~/split$
--- End code ---

mocore:

--- Quote from: Rich on July 18, 2023, 12:15:20 PM ---Hi mocore
It sounds like the primary objective here is to break the URL up
into its basic parts.

--- End quote ---

hi rich

the goal i had in mind was to produce function to create a some sort of consistent maping (not shore this is the best term tbh)
between a(ny) tc mirror url and *some local dir*

afair because busybox wget do not support "-x, --force-directories; -nh, –no-host-directories  or –cut-dirs" ,
which is just manipulating simple strings of text !


--- Code: ---# eg ?..
# mkdir -p $(dirname $localpath)
# wget -O$localpath $remotepath
--- End code ---

inputs: mirror-url , local-dir-path

output local-dir-path/*relevent bits of*mirror-url

eg  ~in: http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz  /tmp
    ~out: /tmp/13.x/x86/tcz/info.lst.gz
  ~in:http://tinycorelinux.net/14.x/x86/tcz/src/openssh/compile_openssh /test
~out: /test/14.x/x86/tcz/src/openssh/compile_openssh

which could be used to replace an more ugly/unclear/casespesific section of mk-repo-table sh/awk function
... AND potentualy other scripts   

you picked a good example !!

--- Code: ---URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"
--- End code ---
the  "TestVersion/3.x" would cause problem for the vercol function  ::)

using

--- Code: ---tr -s "/"
--- End code ---
removes any potentual double forward slash amiguity from url's , and avoids ofseting counting due to protocall "://" adding empty column


--- Quote from: Rich on July 18, 2023, 12:15:20 PM ---Here is what we know about the URL:

--- End quote ---

this got me thinking , that initialy aproach this from the wrong angle


--- Code: ---cat ./mirrors |tr -s "/" | awk  '{ uri=$0; split(uri,U,"/");ul=length(U);u="";for(i=3;i<ul;i++){ u=u"/"U[i]  } ; print ul,u }'

--- End code ---
and started mulling over the above
 starting with what is known the mirror urls
to aquire the number of column's/slashes of server/path section to trim from the localpath
... then build up url from other known components  version / arch /  , ect ( afair my mk-repo-table creates url this way)



--- Quote from: Rich on July 18, 2023, 12:15:20 PM ---Dir is the second to last field and always equal to tcz.

--- End quote ---

when i initaly started this topic my intention was for as much generally as posible  (like wget options )
so that the function could have as few asumptions as posible about the url  :-\
eg : http://tinycorelinux.net/13.x/x86_64/release/distribution_files/modules64.gz

the cloice of awk over sh was because awk is easyer imho to read again and grok the intent

 :-\

Navigation

[0] Message Index

Go to full version