WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Speeding up update check  (Read 3777 times)

Offline bmarkus

  • Administrator
  • Hero Member
  • *****
  • Posts: 7183
    • My Community Forum
Speeding up update check
« on: November 21, 2011, 07:01:53 AM »
Update check with AppsAudit is slow due to the protocol overhead dowloading small, 40-50 bytes long md5 files. Even I have a 30 Mbit/s connection, I see long waitings to open transfer at the server side and connection errors (time outs)  frequently.

It would be much more faster to use a single file with all MD5 and processed locally. While it looks stupid to download all values for sure it would give better performance. Lets see the figures.

Actually there are 3,461 MD5 files in the repo. A flat txt file with all MD5 is 174k long. Applying different compressions:

bz2 - 75k
xz - 81k
gz - 92k
zip - 93k

bzip2 is part of the base (busybox applet) so no additional tool is required and fast enough to decode. 75k is small enough to download with phone modem also.

Of cource, it is possible to split up to smaller files or if server side supports just to send a list of installed extensions and return MD5 list dynamically only for these extensions, but this can be done later.

At the server side it requires an up to date list, but there are several lists created already, this is just another, hopefully in sync with repo content.

Good point is that such a method can be tested off the base and added later if it works as expected.


« Last Edit: November 21, 2011, 08:47:57 AM by bmarkus »
Béla
Ham Radio callsign: HA5DI

"Amateur Radio: The First Technology-Based Social Network."

Offline TinyFan

  • Newbie
  • *
  • Posts: 14
Re: Speeding up update check
« Reply #1 on: November 21, 2011, 08:38:34 AM »
That's a nice improvement, hope it will work. :)

Offline robc

  • Sr. Member
  • ****
  • Posts: 447
Re: Speeding up update check
« Reply #2 on: November 21, 2011, 04:41:56 PM »
It would be much more faster to use a single file with all MD5 and processed locally..
I have been thinking the same thing for a while now but assumed the file size of the md5sum list would be too big.
"Never give up! Never surrender!" - Commander Peter Quincy Taggart

"Make it so." - Captain Picard

Offline jur

  • Hero Member
  • *****
  • Posts: 863
    • cycling photo essays
Re: Speeding up update check
« Reply #3 on: November 23, 2011, 06:49:34 PM »
Likewise I have thought about this long already.

I am a bit surprised at the outcome of the different compression results - I assume you actually tested those? Because md5sums are information-high strings and don't compress very well.

Nevertheless, even uncompressed, downloading that in one hit would be vastly faster than the current piecemeal method.

I was thinking that a java script embedded in the web page could auto-assemble the md5 file dynamically as there are updates added. Perhaps even differential updates such as zsync could be used to speed it up even more, and this would reduce the web bandwidth as well.

Offline bmarkus

  • Administrator
  • Hero Member
  • *****
  • Posts: 7183
    • My Community Forum
Re: Speeding up update check
« Reply #4 on: November 24, 2011, 01:39:46 AM »
I am a bit surprised at the outcome of the different compression results - I assume you actually tested those? Because md5sums are information-high strings and don't compress very well.

Yes, tested on real data from repository. I expected xz to give best result, so I was suprised.

While md5 can't be compressed, half is text wich can be very well compressed due to limited characters used in names.
Béla
Ham Radio callsign: HA5DI

"Amateur Radio: The First Technology-Based Social Network."

Offline roberts

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 7361
  • Founder Emeritus
Re: Speeding up update check
« Reply #5 on: January 15, 2012, 12:16:33 AM »
Look for it in 4.3 release candidate one.
10+ Years Contributing to Linux Open Source Projects.

Offline yoshi314

  • Full Member
  • ***
  • Posts: 135
Re: Speeding up update check
« Reply #6 on: January 18, 2012, 07:25:30 AM »
Of cource, it is possible to split up to smaller files or if server side supports just to send a list of installed extensions and return MD5 list dynamically only for these extensions, but this can be done later.

i think zsync would work well, even if all md5's (of the whole repo) are kept in one file. i would not try splitting the md5 file to match only installed extensions - it's small enough as it is.

also zsync handles gzipped txt fairly well, from what they say on its page. it's not likely to be bandwidth expensive anyway.

Offline martin

  • Jr. Member
  • **
  • Posts: 87
Re: Speeding up update check
« Reply #7 on: January 18, 2012, 07:06:51 PM »
I noticed the slow updates check too and wanted to help and do something about it. I made a program that generates a package database on the repository/mirror which contains the filename, md5, description and info file of every extension.

This package database is then downloaded (through the program) by the user, and depending on the command line paramter will:
* check all current md5s against the one in the database to see if a package has been modified/updated
* will re-create any missing md5s on the user's system
* will allow a package search (similar to "apt-cache search" in debian)
* install a package (a wrapper to tce-load)
* allows easy mirror selection

and all processing is done on the client side, nothing depends on the server or mirrors unless something has been updated. It contains its own embedded database.

I told roberts about this and he will take a look at it when he gets back from the convention.