WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Why does sce-update take so much time (even more than sce-import)?  (Read 17232 times)

Offline sm8ps

  • Sr. Member
  • ****
  • Posts: 338
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #15 on: January 19, 2019, 02:48:14 PM »
Now that I have remembered that sce-update is a separate extension and thus it does not make sense waiting for a new release candidate (I am either old-school or old and slow to adapt or all of it) I shall test my case in the next few days and report back.

Offline sm8ps

  • Sr. Member
  • ****
  • Posts: 338
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #16 on: January 21, 2019, 01:42:49 PM »
Here are my new test results. The new sce-update does show any change in performance.

As before, case A) uses three (non-relevant) PPAs whereas case B) uses the same three PPAs plus 12 Ubuntu repositories (cf. previous posts for detail).

Old sce-update for reference:
Code: [Select]
sce-update -crn X-LIST    A) 3'30"    B) 7'00"
sce-update -rn X-LIST     A) 4'56"    B) 13'30"

New sce-update:
Code: [Select]
sce-update -crn X-LIST    A) 0'47"    B) 6'48"
sce-update -rn X-LIST     A) 2'38"    B) 13'18"

For comparison the time for importing the extension. I have not tested case B) yet.
Code: [Select]
sce-import -rpln    A) 1'55"

The situation was set up such that by the changing of the repos there was a definite need for update.  The timing was taken upon changing from case A to B or vice versa.

My conclusion is that the new update routines do have a considerable effect. This is great! Case A) shows that the time for importing is much lower than the importing as was hoped for. The result look rather solid to me and they add up as expected.

I cannot explain why the effect in case B) is so small. Does anybody have a clue?

Offline Jason W

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 9730
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #17 on: January 21, 2019, 08:58:42 PM »
When sce-update or sce-import is dealing with extra repos, it uses the slower awk routine in fetching package data rather than the faster grep routine that is used with the standard and security update repos.  The Packages files of the standard and security repos that are enabled by default have been formatted to where grep can be used.  Any Packages files from extra repos have not been formatted, so the awk routine must be used, which is not as fast but accurately deals with them.  So there is a performance penalty in using extra repos both in import and update.  The more extra repos would mean slower performance depending on the size of the Packages files of those extra repos.


Offline sm8ps

  • Sr. Member
  • ****
  • Posts: 338
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #18 on: January 22, 2019, 08:54:00 AM »
Okay, that makes perfect sense. Thank you, Jason, master of Awk (among many other martial arts), for sharing your insight!

Stretching the topic of this thread and combining it with the decision to only support LTS versions plus the current release (I cannot find the thread at the moment): would it make sense (also in terms of effort) to pre-format also the {main, backports, update}-{multiverse, restricted, universe} repo lists, at least for the LTS version?

Offline sm8ps

  • Sr. Member
  • ****
  • Posts: 338
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #19 on: January 22, 2019, 11:22:27 AM »
Also for limiting the expectations, I would like to add that the official repos are at the same time relevant for many users and huge in size. Both aspects make them stick out among the various other repos like PPAs etc.

Offline Jason W

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 9730
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #20 on: January 22, 2019, 09:16:33 PM »
One question, do you have GNU awk installed?  GNU awk is almost twice as fast as Busybox awk in getting package info during sce-import and sce-update, just did some tests on it.  And awk is what is used in extra repo functions.  Awk is always used in sce-import/sce-update, but grep is used to quickly get a snippet of the Packages files in the main and security repo before awk is run on it.  Grep gives an about 20% performance increase over using only the awk routine when GNU awk is installed. 

I will think of how I can truncate extra repo files during use like the main and security ones are done on the server to only include what is needed.  That will save some time. 

Thanks


« Last Edit: January 22, 2019, 10:15:45 PM by Jason W »

Offline jls

  • Hero Member
  • *****
  • Posts: 2135
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #21 on: January 23, 2019, 03:48:50 AM »
Hi
do you mean the gawk package?
dCore user

Offline Jason W

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 9730
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #22 on: January 23, 2019, 05:44:19 AM »
Hi.  Yes, gawk is the Debian/Ubuntu package name.   

Offline sm8ps

  • Sr. Member
  • ****
  • Posts: 338
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #23 on: January 23, 2019, 05:22:50 PM »
Right you are, Jason, GNU Awk does make a big difference compared to Busybox Awk which had been in use before. Here are my updated results. Sorry to repeat most of it but I believe it is easier to read the data when it is all in one place.

Case A) only three (non-relevant) PPAs: case B) the same three PPAs plus 12 Ubuntu repositories. The tests were always run at least twice and showed to be consistent after the DEBINX-file had been updated. In retrospect it would have been better to exclude all extra repositories but the influence of the PPAs does not seem very important.

Old sce-update with Busybox Akw for reference:
Code: [Select]
sce-update -rn X-LIST     A) 4'56"    B) 13'30"
sce-update -crn X-LIST    A) 3'30"    B) 7'00"

New sce-update with Busybox Akw:
Code: [Select]
sce-update -rn X-LIST     A) 2'38"    B) 13'18"
sce-update -crn X-LIST    A) 0'47"    B) 6'48"

New sce-update with GNU Awk:
Code: [Select]
sce-update -rn X-LIST      A) 2'40"    B) [b]9'05"[/b]
sce-update -crn X-LIST     A) 0'41"    B) [b]4'17"[/b]

Time for sce-import:
Code: [Select]
sce-import -rpln X-LIST     A) 1'51"    B) 4'49"

A) As observed before, the overall time is reduced to about 55% from the original state for standard repositories due to the new sce-update routine. The checking time (option -c) is reduced to about 20% which is pretty spectacular.
B) Using now also GNU Awk reduces the overall time for several big extra repositories to about 65% and the checking time to about 60% which is quite an improvement given the absolute values.

A) Comparing to sce-import, which had been my original motivation, the checking time of sce-update has decreased to about 40% for standard repositories.
B) With several big extra repositories, it is reduced to about 90%. Comparing to the almost 150% from the original state this is impressive and makes sce-update much more performant and thus usable than before.

Many, many thanks for your efforts, Jason!

Offline sm8ps

  • Sr. Member
  • ****
  • Posts: 338
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #24 on: January 24, 2019, 05:01:20 PM »
I got to test the new debGet* routines that will be in the next release-candidate. They have a similar effect for extra repositories as the new sce-update routine had for the standard repositories. Since the latter case can be considered all solved, I shall only state the results for my former case B) with the extra Ubuntu repositories. This is still with GNU Awk installed but I think this does not matter anymore.

I compare the checking time with the total update time.
  • Old sce-update with Busybox Akw
  • New sce-update with Busybox Akw
  • New sce-update with GNU Awk
  • New sce-update with GNU Awk and new debGet* routines
Code: [Select]
    Check Update
1.) 7'00" 13'30"
2.) 6'48" 13'18"
3.) 4'17"  9'05"
4.) 2'20"  5'22"
Code: [Select]
sce-import:
1.-3.) 4'49"
4)     3'14"

The results were consistent across three runs. The difference should agree with the time for sce-import which does hold true indeed! Only note that that time has decreased by about 35% as well! I believe this is due to the new debGet* routines. Otherwise this change would point to a flaw in my measurings.

At first, I suspected that the pruning of the repo files only has to be performed once and the fact that I had to run the checking before the actual update is responsible for the miracle. However, this is not supported by the fact that the very first run by mistake was an update without prior checking.

So the new debGet* routines are tremendously effective. The checking time is down to about 75% of the import time which by itself has been reduced by about 35%! Naturally, these values depend on the actual changes in the package dependencies but the performance is lightning fast now.

It is absolutely stunning what JasonW has achieved. Congratulations and many thanks, Jason!
« Last Edit: January 24, 2019, 05:13:24 PM by sm8ps »

Offline Jason W

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 9730
Re: Why does sce-update take so much time (even more than sce-import)?
« Reply #25 on: January 24, 2019, 07:39:44 PM »
Thanks for testing, the new RC is now uploaded. 

Now grep is used in sce-import/sce-update with extra repo DEBINX files just like the main and security repo, the extra DEBINX files are formatted to reduce size and standardize the contents.

The GNU grep package provides better performance over the Busybox version, though both function the same in this case.