WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Tinkering with rebuildfstab  (Read 12383 times)

Offline CNK

  • Wiki Author
  • Sr. Member
  • *****
  • Posts: 390
Tinkering with rebuildfstab
« on: February 24, 2023, 09:30:19 PM »
Following the recent rejected rewrite of the device identification in Tiny Core's rebuildfstab script, intended for quicker execution time, I thought I'd play around with the suggestion in the GitHub discussion that:
Quote
Perhaps a large part of the time taken is by the "find" command, where adding "-maxdepth 2" may give some speedup.

Testing in Tiny Core Pure64 13, I found that this indeed sped it up to a small degree. But this suggestion also meant that there are never any desired "dev" files deeper than one level below the /sys/block directory, in which case the find command can simply be replaced by shell file name globbing. Also the DEVNAME and DEVMAJOR variables could be set using shell string manipulations instead of calling external programs.

Comparison of old/new moved to attachment (rebuildfstab_changes.txt) to avoid forum error.

While not near the stats of the earlier proposed modification, the speed-up still looks significant, and the changes here are much more minor.

(running twice to reduce cache effects, which are less significant on this particular PC than they are on others)
Code: [Select]
# echo modified: ;time rebuildfstab-mod; echo original: ; time rebuildfstab; echo original: ; time rebuildfstab;echo modified: ;time rebuildfstab-mod;
modified:
real    0m 0.39s
user    0m 0.08s
sys     0m 0.18s
original:
real    0m 0.63s
user    0m 0.20s
sys     0m 0.40s
original:
real    0m 0.60s
user    0m 0.21s
sys     0m 0.38s
modified:
real    0m 0.37s
user    0m 0.07s
sys     0m 0.17s

The only potential behaviour change is that the order of the entries in /etc/fstab will be different if an unpartitioned block device (eg. /dev/sda) is added there. I'm guessing that the exact order of auto-generated fstab entries doesn't matter much to anyone anyway.

Attached is the full file, where I also added ext4 at the mount options bit. I don't really think the execution time of rebuildfstab is much of an issue, it was just a fun thing to play around with, so whether TC wants to adopt this change or not, I don't really care. But if so, then go right ahead.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11089
Re: Tinkering with rebuildfstab
« Reply #1 on: February 25, 2023, 03:36:55 AM »
Thanks for the changes. There are some issues:
- the changed DEVMAJOR line uses wrong var (i)
- it calls fstype potentially many more times
- it calls grep many times, which is what an earlier patch specifically changed to the combined awk
- if there is no glob match, the pattern is passed as-is (I guess the -b test was added for this)

It may not be simple to apply just the globbing; when I tried, it was slower.
« Last Edit: February 25, 2023, 03:40:51 AM by curaga »
The only barriers that can stop you are the ones you create yourself.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11089
Re: Tinkering with rebuildfstab
« Reply #2 on: February 25, 2023, 03:55:51 AM »
It does seem the find line is the main source of slowness, but also one that is not easily rewritten in shell while keeping minimal changes (to minimize the chance of regressions). If anyone is interested in tackling this, replacing just the find command with a small C program could be a nice exercise (glob(), strstr()).
The only barriers that can stop you are the ones you create yourself.

Offline hiro

  • Hero Member
  • *****
  • Posts: 1243
Re: Tinkering with rebuildfstab
« Reply #3 on: February 25, 2023, 04:33:01 AM »
did you measure how much time blkid takes when you run it once vs. many times?
i think if find is slow then it might be bugged. it shouldn't be supposed to do very much! :O

Offline hiro

  • Hero Member
  • *****
  • Posts: 1243
Re: Tinkering with rebuildfstab
« Reply #4 on: February 25, 2023, 04:37:36 AM »
on my machine, running blkid for the first time take 6x as much time as find.
and after everything is cached blkid takes the same amount of time as find. and it seems like there's no cache-related variation with find.
grep is at least a magnitude faster.

i'm happy to help obsess some more about this at some point, as it's such a small contained neat thing. ;)
« Last Edit: February 25, 2023, 04:39:36 AM by hiro »

Offline CNK

  • Wiki Author
  • Sr. Member
  • *****
  • Posts: 390
Re: Tinkering with rebuildfstab
« Reply #5 on: February 25, 2023, 06:17:23 AM »
Thanks for the changes. There are some issues:
- the changed DEVMAJOR line uses wrong var (i)
Oops, yes I messed that one up completely, it should be "DEVMAJOR=${DEVMAJOR%%:*}" there of course.

Quote
- it calls fstype potentially many more times

How do you come to this conclusion (besides due to the DEVMAJOR error)?

For me this script shows that the same number of file system device lines are run through the parts of the loop where the fstype and grep calls are located. If the DEVNAME and DEVMAJOR variables are also the same in each version, then I don't see where there can be a difference in the behaviour.

Code: [Select]
#!/bin/busybox ash
rm -f /tmp/blkdev.txt

for i in /sys/block/*/dev /sys/block/*/*/dev; do
  case "$i" in
    *loop*|*ram*)
      continue
      ;;
  esac

  DEVNAME=${i%*/dev}
  DEVNAME=${DEVNAME##*/}
  [ -b "/dev/$DEVNAME" ] || continue

  echo "$i" >> /tmp/blkdev.txt
done

echo rebuildfstab-mod:
wc -l /tmp/blkdev.txt
rm /tmp/blkdev.txt

for i in `find /sys/block/*/ -name dev`; do
  case "$i" in
    *loop*|*ram*)
      continue
      ;;
  esac

  echo "$i" >> /tmp/blkdev.txt
done

echo rebuildfstab:
wc -l /tmp/blkdev.txt
rm /tmp/blkdev.txt

Quote
- it calls grep many times, which is what an earlier patch specifically changed to the combined awk

Oh right, I forgot to check the current version on Github and just based this on the version in TC 13. Now that I've seen that, I see what you're talking about (it took a while) - you're using "read" to advance through the list of block devices now, and feeding it from find and awk.

Quote
- if there is no glob match, the pattern is passed as-is (I guess the -b test was added for this)

Yes the -b test catches that.

Quote
It may not be simple to apply just the globbing; when I tried, it was slower.

OK. Testing against the 'new' rebuildfstab on GitHub the execution time is much closer with my version Vs 'rebuildfstab-new', and that's on a test machine with only four mountable file systems so not many calls to Grep which affect the older design that I based mine on. I'll believe you that the new layout is no faster with globbing (fed to awk via "echo", presumably), I've spent enough time on this already now.

Code: [Select]
# echo original:; time rebuildfstab; echo github: ; time ./rebuildfstab-new;echo modified: ;time rebuildfstab-mod;echo modified: ;time rebuild
fstab-mod; echo github: ; time ./rebuildfstab-new;echo original:; time rebuildfstab;
original:
real    0m 0.82s
user    0m 0.22s
sys     0m 0.50s
github:
real    0m 0.37s
user    0m 0.09s
sys     0m 0.26s
modified:
real    0m 0.33s
user    0m 0.12s
sys     0m 0.16s
modified:
real    0m 0.36s
user    0m 0.12s
sys     0m 0.19s
github:
real    0m 0.37s
user    0m 0.09s
sys     0m 0.26s
original:
real    0m 0.81s
user    0m 0.21s
sys     0m 0.51s
EDIT: Oops again, posted test results before applying the DEVMAJOR fix to my version before.
« Last Edit: February 25, 2023, 06:47:20 AM by CNK »

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11089
Re: Tinkering with rebuildfstab
« Reply #6 on: February 25, 2023, 11:09:43 AM »
did you measure how much time blkid takes when you run it once vs. many times?
i think if find is slow then it might be bugged. it shouldn't be supposed to do very much! :O
It's not that find is buggy, it's how it works (and bb find is slower than GNU find too). It runs a stat call for each file, and in the searched directories there are a lot of files. We're only interested in the file names in this invocation, file types/sizes/times don't matter. So this part is the lowest hanging fruit currently, I believe.

i'm happy to help obsess some more about this at some point, as it's such a small contained neat thing. ;)
Everyone is welcome to go over the TC scripts. rebuildfstab's speed is fast enough for me personally, but if you find it too slow, please do look into it.

Quote
- it calls fstype potentially many more times

How do you come to this conclusion (besides due to the DEVMAJOR error)?
The system version runs blkid once (it's in the END block in awk), and parses the output. Your version runs it once per device. The later calls should be cached and fast, but it's still extra calls.
The only barriers that can stop you are the ones you create yourself.

Offline hiro

  • Hero Member
  • *****
  • Posts: 1243
Re: Tinkering with rebuildfstab
« Reply #7 on: February 25, 2023, 12:58:08 PM »
The system version runs blkid once (it's in the END block in awk), and parses the output. Your version runs it once per device. The later calls should be cached and fast, but it's still extra calls.

thanks for clearing that up. i actually missed the END and thought it was run multiple times.

if the GNU find is faster than bb find, then maybe it's not worth optimizing for this. as in the long-term it would be more useful if the busybox find was made faster instead (not our problem).

any remaining bottlenecks are now much less obvious to my intuition and i agree we're probably hitting diminishable returns here.

but i'm gonna still offer to try out more solutions, even on a bigger SAS array, if that would be of any help to the remaining benchmark fanatics :P

Offline CNK

  • Wiki Author
  • Sr. Member
  • *****
  • Posts: 390
Re: Tinkering with rebuildfstab
« Reply #8 on: February 25, 2023, 05:18:39 PM »
Quote
- it calls fstype potentially many more times
How do you come to this conclusion (besides due to the DEVMAJOR error)?
The system version runs blkid once (it's in the END block in awk), and parses the output. Your version runs it once per device. The later calls should be cached and fast, but it's still extra calls.

Yeah I was still comparing with the TC 13 version there. If before posting I'd remembered to check on GitHub to see if rebuildfstab had changed since TC 13 then I would have seen the new layout and never talked about my experiments on the old version in first palce. Sorry to waste time.

Offline hiro

  • Hero Member
  • *****
  • Posts: 1243
Re: Tinkering with rebuildfstab
« Reply #9 on: February 25, 2023, 06:44:57 PM »
nah, you didn't start this "time-waste".
it's at least entertaining and at best educational to the next reader.

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12276
Re: Tinkering with rebuildfstab
« Reply #10 on: February 26, 2023, 01:46:01 AM »
Hi curaga
... Everyone is welcome to go over the TC scripts. rebuildfstab's speed is fast enough for me personally, but if you find it too slow, please do look into it. ...
I think I may have an approach that doesn't need the  find  command.
The  /dev/block/*  directory seems to contain the information being searched for.
I used a cut down version of  rebuildfstab  containing the  find  loop that isolates
the value of  DEVNAME  and  DEVMAJOR as a baseline. I included an  echo
statement to list the found devices.
Here are the timing results:
Code: [Select]
tc@E310:~/rebuildfstab$ time ./findblockdevs > /dev/null
real    0m 0.71s
user    0m 0.22s
sys     0m 0.53s
tc@E310:~/rebuildfstab$ time ./newfindblockdevs  > /dev/null
real    0m 0.09s
user    0m 0.04s
sys     0m 0.05s
tc@E310:~/rebuildfstab$

It finds the same devices (order does not matter):
Code: [Select]
tc@E310:~/rebuildfstab$ time ./findblockdevs
DEVNAME=sda4    DEVMAJOR=8
DEVNAME=sda2    DEVMAJOR=8
DEVNAME=sda     DEVMAJOR=8
DEVNAME=sda7    DEVMAJOR=8
DEVNAME=sda5    DEVMAJOR=8
DEVNAME=sda3    DEVMAJOR=8
DEVNAME=sda1    DEVMAJOR=8
DEVNAME=sda6    DEVMAJOR=8
DEVNAME=sdb     DEVMAJOR=8
DEVNAME=sdb1    DEVMAJOR=8
DEVNAME=sdc     DEVMAJOR=8
DEVNAME=sdd     DEVMAJOR=8
DEVNAME=sde     DEVMAJOR=8
DEVNAME=sdf     DEVMAJOR=8
DEVNAME=sr0     DEVMAJOR=11
DEVNAME=sr1     DEVMAJOR=11
real    0m 0.70s
user    0m 0.22s
sys     0m 0.52s
tc@E310:~/rebuildfstab$ time ./newfindblockdevs
DEVNAME=sr0     DEVMAJOR=11
DEVNAME=sr1     DEVMAJOR=11
DEVNAME=sda     DEVMAJOR=8
DEVNAME=sda1    DEVMAJOR=8
DEVNAME=sdb     DEVMAJOR=8
DEVNAME=sdb1    DEVMAJOR=8
DEVNAME=sda2    DEVMAJOR=8
DEVNAME=sda3    DEVMAJOR=8
DEVNAME=sdc     DEVMAJOR=8
DEVNAME=sda4    DEVMAJOR=8
DEVNAME=sdd     DEVMAJOR=8
DEVNAME=sda5    DEVMAJOR=8
DEVNAME=sda6    DEVMAJOR=8
DEVNAME=sde     DEVMAJOR=8
DEVNAME=sda7    DEVMAJOR=8
DEVNAME=sdf     DEVMAJOR=8
real    0m 0.09s
user    0m 0.03s
sys     0m 0.06s
tc@E310:~/rebuildfstab$

I've attached the baseline file (findblockdevs) and the version with my changes (newfindblockdevs).
Lines beginning with  ###  were commented out by me.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11089
Re: Tinkering with rebuildfstab
« Reply #11 on: February 26, 2023, 02:07:01 AM »
I worry about a potential race there - is it certain those devices (/the newest of them) exist when rebuildfstab is called. /sys gets populated early, but I don't remember when exactly udev creates devices wrt the called scripts.

No interest in the C approach :P
The only barriers that can stop you are the ones you create yourself.

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12276
Re: Tinkering with rebuildfstab
« Reply #12 on: February 26, 2023, 02:23:19 AM »
Hi curaga
How do you feel about using  /proc/partitions? I can cut the time
by another factor of 2 parsing that source.

Offline hiro

  • Hero Member
  • *****
  • Posts: 1243
Re: Tinkering with rebuildfstab
« Reply #13 on: February 26, 2023, 05:19:30 AM »
7x faster, nice :D

maybe we should also replace blkid with a shellscript, then? /s
« Last Edit: February 26, 2023, 05:22:10 AM by hiro »

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11089
Re: Tinkering with rebuildfstab
« Reply #14 on: February 26, 2023, 10:01:51 AM »
/proc should be fine.
The only barriers that can stop you are the ones you create yourself.