WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Tracking timestamps of Original-site URLs with GNU WGET  (Read 238 times)

Offline Nathan_SR

  • Jr. Member
  • **
  • Posts: 82
    • Quick-Save-Live
Tracking timestamps of Original-site URLs with GNU WGET
« on: August 05, 2018, 09:20:38 AM »
I saw this option in the gnu wget manual and thought might be useful to track changes, in so many extension’s parent sites:

wget also has an option to track timestamps -N

When running Wget with -N, with or without -r or -p, the decision as to whether
or not to download a newer copy of a file depends on the local and remote
timestamp and size of the file.

-N, --timestamping               Turn on time-stamping.

So:

wget -N -i urlslist.txt 2>&1 | tee -a log.txt

could help us to track what has changed.

Offline Greg Erskine

  • Full Member
  • ***
  • Posts: 231
Re: Tracking timestamps of Original-site URLs with GNU WGET
« Reply #1 on: August 05, 2018, 04:01:28 PM »
hi Nathan_SR,

Thanks for reporting your findings.

I am reluctant to install the gnu version of any of the commands unless there is no other solution. I like to keep with the Tiny Core (piCore) philosophy of using Busybox and keep it very small. I don't even use bash.

For small files I just download anyway or keep track of the md5 and download if different.

For all the repository stuff I use the standard tce-* tools that download files using Busybox wget.

I suppose it depends on how much data you need to transfer?

When using gnu version of the commands you have to be careful as they are not always compatible. Recently I had scripts using Busybox "wget -s" when I downloaded gnu wget they stopped working as gnu uses "wget --spider". Its easy to fix but you need to keep track of which version you are using. Then Busybox has changed over to "--spider" so the script broke again. lol  :)

regards
Greg

Offline Nathan_SR

  • Jr. Member
  • **
  • Posts: 82
    • Quick-Save-Live
Re: Tracking timestamps of Original-site URLs with GNU WGET
« Reply #2 on: August 05, 2018, 06:00:00 PM »

Thanks Greg for sharing your thoughts on this. My opinion on this is : if you are getting some very needed features from an executable ( like for eg. some GNU Utilities ), then we can always extract the executable ( and its libraries if any ) alone from its tcz file and keep them all in some new home directory folder ( not install it in /usr/local/bin, as they take precedence and cause clashes ) and use it in our scripts with the new path prefixed. Add export LD_LIBRARY_PATH=newfolderpath;$LD_LIBRARY_PATH at the top of the script, after the shebang line, to help the executable pick up its needed libraries, if any. This way you can even port executables from other linux systems of similar architecture ( use ldd executable_name command to detect only special libraries referenced by it and copy them all up to a single new folder and tar it )

Now, the purpose of sharing this whole thing, is to help the tce maintainers, keep track of all parent site download URLs for changes, in an automated way, especially if they are doing it manually now. 

Offline Nathan_SR

  • Jr. Member
  • **
  • Posts: 82
    • Quick-Save-Live
Re: Tracking timestamps of Original-site URLs with GNU WGET
« Reply #3 on: August 05, 2018, 08:04:07 PM »

A small typing mistake in my previous post. Corrected below :

Code: [Select]
export LD_LIBRARY_PATH=newfolderpath:$LD_LIBRARY_PATH

Offline Nathan_SR

  • Jr. Member
  • **
  • Posts: 82
    • Quick-Save-Live
Re: Tracking timestamps of Original-site URLs with GNU WGET
« Reply #4 on: August 06, 2018, 07:52:38 AM »

Forgot to mention, that I got this concept of automated remote website tracking, from this discussion : https://stackoverflow.com/questions/6801704/checksum-remote-file

Applying this concept could give us the needed benefits.