General TC > Programming & Scripting - Unofficial
(tc) url to local path awk/sh
(1/1)
mocore:
url to local path
with awk to shell
idk if its useful for much tbh :-\
--- Code: ---
# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri "$url" "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
url="${1-'http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz'}";
shift
echo "$url" | busybox awk -v path_prefix="$1" -F"/" '
function get_path(u ,path ){
colafter=5
split(u,path,"/")
path_length=length(path)
p=""
pc=0
for(i in path) {
if( i > colafter && i <= path_length -1 ) {
pc=(pc+1);
if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
}
}
return p ;
}
function mkpathstr(u){
protocall=$1"//"
domain=$2
path=get_path($0)
remotepath=$0
file=$NF
ver=$4
arch=$5
localpath=path_prefix"/"ver"/"arch"/"path"/"file
}
{
mkpathstr($0)
# return vars from awk
print protocall, remotepath , path , file , ver , arch ,localpath
}
' ; }
url="$1"
pathprefix="$2"
# read vars into script
read protocall remotepath path file ver arch localpath < <( tc_uri "${@}" )
echo -e "vars:\nprotocall : $protocall \nremotepath : $remotepath \npath : $path \nfile : $file \nver : $ver \narch : $arch \nlocalpath : $localpath "
# eg ?..
# mkdir -p $(dirname $localpath)
# wget -O$localpath $remotepath
--- End code ---
mocore:
modified script
- to use set to pass args from awk
- remove the (bash) process substitution ::) ,
https://en.wikipedia.org/wiki/Process_substitution
https://stackoverflow.com/questions/30781969/difference-on-bash-and-ash-parentheses
..& found ash "process substitution" alternative using "file descriptors" ???
https://stackoverflow.com/questions/30781969/difference-on-bash-and-ash-parentheses/69739256#69739256
https://unix.stackexchange.com/questions/309547/what-is-the-portable-posix-way-to-achieve-process-substitution/639752#639752
--- Code: ---
# test
# busybox ash -c '. ./tc-uri.sh ; echo "$# $@ >$path"'
protocall="";
remotepath="";
path="";
filename="";
ver="";
arch="";
localpath="";
# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri "$url" "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
url="${1-"http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz"}";
shift
echo "$url" | busybox awk -v path_prefix="$1" -F"/" '
function get_path(u ,path ){
colafter=5
split(u,path,"/")
path_length=length(path)
p=""
pc=0
for(i in path) {
if( i > colafter && i <= path_length -1 ) {
pc=(pc+1);
if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
}
}
return p ;
}
function mkpathstr(u){
protocall=$1"//"
domain=$2
path=get_path($0)
remotepath=$0
file=$NF
ver=$4
arch=$5
localpath=path_prefix"/"ver"/"arch"/"path"/"file
}
{
mkpathstr($0)
# return vars from awk
print protocall, remotepath , path , file , ver , arch ,localpath
}
' ; }
# echo $@
set $(tc_uri "${@}" ) ;
# echo $# $@
protocall="$1";
remotepath="$2";
path="$3";
filename="$4";
ver="$5";
arch="$6";
localpath="$7";
shift $#
--- End code ---
mocore:
updated awk function adding method to find column where the tc repo path starts
eg [repo path]: http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz
so the function *should* now work with any mirror url 8) *yay*
--- Code: ---# test
# busybox ash -c '. ./tc-uri.sh ; echo "$# $@ >$path"'
protocall="";
remotepath="";
path="";
filename="";
ver="";
arch="";
localpath="";
# tc_uri
# $1=$url
# $2=$local_prefix_path
#
# eg tc_uri "$url" "$local_path_prefix"
#
# eg tc_uri http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
#
function tc_uri(){
url="${1-"http://tinycorelinux.net/13.x/x86/tcz/provides.db.gz"}";
shift
echo "$url" | busybox awk -v path_prefix="$1" -F"/" '
function vercol(urlpath ,upl ,numb) { for(numb=0;numb < upl;numb++){ if( urlpath[numb] ~ ".*\.x" ) return numb } }
function get_path(u ,path ){
colafter=5
split(u,path,"/")
path_length=length(path)
colafter=vercol( path,path_length) # find position of version column in url
#print "col",colafter
p=""
pc=0
for(i in path) {
if( i > colafter && i <= path_length -1 ) {
pc=(pc+1);
if(pc>1){ p=p"/"path[i] }else{ p=path[i] }
}
}
return p ;
}
function mkpathstr(u){
protocall=$1"//"
domain=$2
path=get_path($0)
remotepath=$0
file=$NF
ver=$4
arch=$5
localpath=path_prefix"/"path"/"file
}
{
mkpathstr($0)
# return vars from awk
print protocall, remotepath , path , file , ver , arch ,localpath
}
' ; }
# echo $@
set $(tc_uri "${@}" ) ;
# echo $# $@
protocall="$1";
remotepath="$2";
path="$3";
filename="$4";
ver="$5";
arch="$6";
localpath="$7";
shift $#
--- End code ---
---
why? ... well because i was'nt realy happy with the simple script i had used for
scraping the .info ect data to build a table of 'what tcz is in arch for each release version'
which required downloading a copy of the of the each repo .info ect from *some mirror* to *some local path*
then runing the script on local files to build the table
now that https://forum.tinycorelinux.net/index.php/topic,25982.msg167935.html#msg167935 dep.db.gz ect has been added
it should be a little less fafing to create 'what tcz is in arch for each release version' ... at lest for newer reop's
;D
--- Quote from: mocore on June 04, 2019, 09:23:05 AM ---
at some point :P
i hope to finish/update the scripts in the linked post
--- Quote ---get-repo-info.sh - small script to download/collect each $arch/info.lst file
mk-repo-table.js - read $arch/info.lst data into json and create html table
--- End quote ---
to 'get/view what tcz is in *each* arch for each release version'
both have been rewritten in awk!
http://forum.tinycorelinux.net/index.php/topic,18767.msg115000.html#msg115000
... in the mean time
a few other methods to create a 'nice' extension list can be found in "Programming & Scripting - Unofficial"
eg
http://forum.tinycorelinux.net/index.php/topic,22016.0.html (tabulate.sh) Enhancing the "Browse TCZs" Webpage with an Automated Script
http://forum.tinycorelinux.net/index.php?topic=20688.0 'UserScript' for Tampermonkey aka "Extentions Repository Browser Userscript"
....
--- End quote ---
https://forum.tinycorelinux.net/index.php/topic,22843.msg143165.html#msg143165 - Re: could we have a web-site index.html of extensions for tc10 64 and 32 bits?
https://forum.tinycorelinux.net/index.php/topic,18767.0.html - get/view what tcz is in arch for each release version
blah blah :P
finito la musica
Rich:
Hi mocore
It sounds like the primary objective here is to break the URL up
into its basic parts.
A proper URL for retrieving Tinycore files consists of these parts:
Protocall://Domain/Version/Arch/Dir/Filename
Here is what we know about the URL:
The Protocall is optional, wget can function without it.
Filename is the last field.
Dir is the second to last field and always equal to tcz.
Arch is the third to last field.
Version is the fourth to last field.
Once you strip those 5 fields from the URL the Domain remains.
That's the wild card. The Domain can include multiple directories
containing slashes, numbers, dots, and x characters, for example:
--- Code: ---URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"
--- End code ---
The following script does just that. It strips the Protocall if it
exists, and then strips the 4 trailing fields leaving the Domain.
I decided to try this using just the ash built-in commands. Aside
from calling the tr command once, I succeeded:
--- Code: ---#!/bin/sh
. /etc/init.d/tc-functions
useBusybox
# A valid URL will look as follows:
# (optional)Protocall://Domain which can contain slashes and numbers/Version/Arch/Dir (always tcz)/Filename
URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"
Remainder="$URL"
# Extract protocall if it exists.
case $Remainder in
*://*) # Test if $Remainder contains ://
# Extract everything preceding ://
Protocall="${Remainder%%://*}"
# Remove leading $Protocall:// from Remainder.
Remainder="${Remainder#$Protocall://}"
;;
esac
# Convert any double, triple, etc slashes to single slashes.
Remainder=$(echo "$Remainder" | tr -s /)
# Really fast version of basename.
Filename="${Remainder##*/}"
# Remove trailing /$Filename from Remainder.
Remainder="${Remainder%/$Filename}"
# $Dir should always contain the string "tcz".
Dir="${Remainder##*/}"
Remainder="${Remainder%/$Dir}"
Arch="${Remainder##*/}"
Remainder="${Remainder%/$Arch}"
Versiom="${Remainder##*/}"
Domain="${Remainder%/$Versiom}"
# Print the original URL and the separated pieces.
echo "URL=$URL"
echo "Protocall=$Protocall"
echo "Domain=$Domain"
echo "Versiom=$Versiom"
echo "Arch=$Arch"
echo "Dir=$Dir"
echo "Filename=$Filename"
--- End code ---
Result of running the file:
--- Code: ---tc@E310:~/split$ ./SplitURL
URL=http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz
Protocall=http
Domain=SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore
Versiom=13.x
Arch=x86
Dir=tcz
Filename=info.lst.gz
tc@E310:~/split$
--- End code ---
mocore:
--- Quote from: Rich on July 18, 2023, 12:15:20 PM ---Hi mocore
It sounds like the primary objective here is to break the URL up
into its basic parts.
--- End quote ---
hi rich
the goal i had in mind was to produce function to create a some sort of consistent maping (not shore this is the best term tbh)
between a(ny) tc mirror url and *some local dir*
afair because busybox wget do not support "-x, --force-directories; -nh, –no-host-directories or –cut-dirs" ,
which is just manipulating simple strings of text !
--- Code: ---# eg ?..
# mkdir -p $(dirname $localpath)
# wget -O$localpath $remotepath
--- End code ---
inputs: mirror-url , local-dir-path
output local-dir-path/*relevent bits of*mirror-url
eg ~in: http://tinycorelinux.net/13.x/x86/tcz/info.lst.gz /tmp
~out: /tmp/13.x/x86/tcz/info.lst.gz
~in:http://tinycorelinux.net/14.x/x86/tcz/src/openssh/compile_openssh /test
~out: /test/14.x/x86/tcz/src/openssh/compile_openssh
which could be used to replace an more ugly/unclear/casespesific section of mk-repo-table sh/awk function
... AND potentualy other scripts
you picked a good example !!
--- Code: ---URL="http://SomeDomain.org/TestVersion/3.x/LinuxDistros/Tinycore/13.x/x86/tcz//info.lst.gz"
--- End code ---
the "TestVersion/3.x" would cause problem for the vercol function ::)
using
--- Code: ---tr -s "/"
--- End code ---
removes any potentual double forward slash amiguity from url's , and avoids ofseting counting due to protocall "://" adding empty column
--- Quote from: Rich on July 18, 2023, 12:15:20 PM ---Here is what we know about the URL:
--- End quote ---
this got me thinking , that initialy aproach this from the wrong angle
--- Code: ---cat ./mirrors |tr -s "/" | awk '{ uri=$0; split(uri,U,"/");ul=length(U);u="";for(i=3;i<ul;i++){ u=u"/"U[i] } ; print ul,u }'
--- End code ---
and started mulling over the above
starting with what is known the mirror urls
to aquire the number of column's/slashes of server/path section to trim from the localpath
... then build up url from other known components version / arch / , ect ( afair my mk-repo-table creates url this way)
--- Quote from: Rich on July 18, 2023, 12:15:20 PM ---Dir is the second to last field and always equal to tcz.
--- End quote ---
when i initaly started this topic my intention was for as much generally as posible (like wget options )
so that the function could have as few asumptions as posible about the url :-\
eg : http://tinycorelinux.net/13.x/x86_64/release/distribution_files/modules64.gz
the cloice of awk over sh was because awk is easyer imho to read again and grok the intent
:-\
Navigation
[0] Message Index
Go to full version