WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: PiCore, RasPi3B and boot code nfsmount  (Read 8499 times)

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #30 on: February 14, 2018, 02:36:07 AM »
@curaga: Please update 5.x/armv6/nfs-utils.tcz with the following error correction:

/usr/local/etc/init.d/nfs-client Line 11
Code: [Select]
     /usr/local/sbin/sm-notify -q
with
Code: [Select]
     /usr/local/sbin/sm-notify -f
/usr/local/sbin/sm-notify: invalid option -- 'q'

https://linux.die.net/man/8/sm-notify
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 10957
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #31 on: February 14, 2018, 10:19:02 AM »
Sorry, I don't use ARM.
The only barriers that can stop you are the ones you create yourself.

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #32 on: February 14, 2018, 11:56:54 AM »
Sorry, I don't use ARM. <-- The only barriers that can stop you are the ones you create yourself. <--  :P

Actually, this is a universal TCL issue (x86/x64/etc.) and there's a hanging problem across distros when shutting down (where mounts fail dismounting for numerous reasons and hang up at either SYNC or UMOUNT -A due to a known issue in nfs-utils...  though I think I've come across an easy-enough fix, at least within rc.shutdown, which thus far has been working like a charm but in order to make rc.shutdown more stable with remote mounts (cifs, nfs, AoE, etc.) it would require a little reworking of /etc/init.d/rc.shutdown and a few self-checks implemented with the following logic:
  • *Attempt to WRITE to a dummy file on each remote mount - many of them may have sleeping hard drives, delete the file afterward
  • First, SYNC and then scan 'mount' for remote mounts and dismount each manually, NOT with umount -a
  • Next, SYNC and then kill running processes as some of these may leave the above rule open with file locks
  • Repeat the first step if any mounts are found left within the list from 'mount'
  • At this point, we've exhausted logistics, force a dismount of remote mounts and allow rc.shutdown to continue

In most cases/distros, when umount -a fails/hangs on a remote mount, there's a two minute timeout which is a while to wait, but not forever...  but it still leaves the possibility for unwritten data and/or other similar means for corruption.  The above logic helps with this a bit, but with remote mounts there's nothing I can think of which covers 100%.
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #33 on: February 14, 2018, 09:14:37 PM »
APPLIES TO ALL ARCHITECTURES

Combination to prevent remote shares from hanging up during shut-down seems complete.
This has been tested in a LAMP environment where all services were operating during the shutdown/reboot request.
Bare in mind, I left a few notes here and there for visual aid while testing; the scripts could use a bath and minor efficiency rewrite.

In /etc/init.d/tc-config immediately before extensions are loaded, add:
Code: [Select]
## Find the largest PID and save it in /tmp/pid_start ##
maxpid=0
ps -a | grep -v USER | awk '{print $1}' > /tmp/pidlist
while read -r item
do
    if [ $item -gt $maxpid ]; then maxpid=$item; fi
done < /tmp/pidlist
echo $maxpid > /tmp/pid_start
rm /tmp/pidlist
This allows us to get an idea of what processes existed PRIOR to extensions being loaded.

Create /etc/init.d/rc.shutdown-shares and launch it as the first command within rc.shutdown:
Code: [Select]
#!/bin/busybox ash
. /etc/init.d/tc-functions
useBusybox

# 1. First, let's TRY to kill all processes NEWER than /tmp/pid_start
pid=`cat /tmp/pid_start`
ps -a | awk '{print $1}' > /tmp/pidlist
while read -r line
do
    if [ $line -gt $pid ]; then kill $line >/dev/null 2>&1; fi
done < /tmp/pidlist

# 2. Next, we need to dismount items in /tmp/tcloop
cd /tmp/tcloop
for dir in *
do
    umount $dir >/dev/null 2>&1
    rmdir $dir >/dev/null
done

# 3. Let's check to see if we have any strays
cd /tmp/tcloop
for dir in *
do
    echo "UMOUNT: Failed dismounting $dir"
done

# 4. Let's now try and dismount any of our NFS mounts after writing
#     dummy info onto each (in case they're asleep)
test=`mount | grep " type nfs " | awk '{print $3}'`
if [ ! "${test}" == "" ]; then
for item in $test
do
    echo "NFS: Write Testing [${item}]"
    sudo touch $item/tcl_nfs_test.txt >/dev/null 2>&1
    usleep 200000
    if [ ! -f $item/tcl_nfs_test.txt ]; then
        echo "  ${RED}Error writing to NFS mount $item${NORMAL}"
    fi
    sudo rm $item/tcl_nfs_test.txt -f >/dev/null 2>&1

    echo "NFS: Dismounting [${item}]"
    sudo umount $item
    tests=`mount | grep "${item}"`
    if [ ! "${tests}" == "" ]; then
        echo "  ${RED}Error dismounting $item${NORMAL}"
    fi
done
fi
This disconnects the shares after killing off any processes which were launched prior to extensions loading.
This is especially vital when mounting an NFS share and then persisting TCE, OPT, etc. onto that share.

I have not proven my theory yet, but I have reason to believe, based on the complaints regarding nfs-utils I've read, there's a good chance umount -a doesn't play nice with mount.nfs and based on the order in which rc.shutdown starts off, we may be asking NFS to dismount prior to file handles being released (guaranteed for /tmp/tcloop), or for all I know we could be dismounting extensions (including nfs-utils??) prior to NFS shares being let go which I'm guessing could also cause issues.

When TCL is running with nfsmount=server:/share and tce=nfs, without the above mods rc.shutdown will hang at the point it tries to umount -a for a few minutes, with the above there's virtually no delay at all.  All other network shares, especially those where TCL is allowed to persist onto the share, need to be treated in the same fashion to prevent lags/hanging as well as help prevent data loss/corruption.
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #34 on: February 15, 2018, 12:07:14 AM »
Please update 5.x/armv6/nfs-utils.tcz

nfs-utils updated to the same version as core/corepure64 and posted to the picore 9.x repos

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 10957
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #35 on: February 15, 2018, 03:03:49 AM »
PID numbers get recycled, and killing processes doesn't prevent data loss, so those risks still exist.
The only barriers that can stop you are the ones you create yourself.

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #36 on: February 15, 2018, 08:43:09 AM »
@juanito: Thanks!  I cannot speak for x86/x64 as I haven't repeated this process there yet, but it may be prudent to add libcap.tcz to nfs-util's dependency list.

@curaga: YES, PIDs are reused, the concept here is to TRY and avoid start-up processes (take DHCPc for example) from getting shut down TOO early (we don't want anything interrupting networking while attempting to disconnect from our shares in this case) so setting a land-mark (ie: don't kill off anything prior to when our extensions were being loaded) sounded like a potentially safe bet for processes which don't reinitialize themselves.  TCL may be a little too mature to begin a new standard, such as implementing init's for daemons...  but who knows! :)

I'm planning on putting together an add-in for rc.shutdown which scans init.d directories and calls scriptname stop for launch scripts it can find and then go against popular apps (such as apache, sql, etc.) and if running, call their associated binaries to shut down naturally (as opposed to SIGs) to assist with some of the killings and potential data issues.  Considering the lack of standardized init scripts within many of the extensions, it's the easiest way I can think of (and an ongoing list over time).

For my needs (which are uncommon versus the community's usual use for TC) I have a "profile" set up for website design, for example, which is a general LAMP configuration and a hand-full of extensions (ionCube, DAV, SVN, etc.) which I'm going to package as my_lamp.tcz which will include init's and other basic content to simplify what's needed here; you're welcome to the init's when they're done if you think they'd be useful.
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #37 on: February 15, 2018, 01:17:47 PM »
@juanito: Please update the dependency list below (based on memory)

nfs-utils.tcz.dep
  • portmap.tcz
  • libcap.tcz
  • tcp_wrappers.tcz
  • filesystems-KERNEL.tcz



Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #38 on: February 15, 2018, 07:34:26 PM »
nfs-utils.tcz Version 1.3.3 I have not read up on as of yet, but from the .info notes, rpcbind.tcz is a replacement for portmap; if this has rpc.statd in the same locations and the likes, please replace portmap.tcz from the dependency listing accordingly (otherwise there are going to be needs in tc-config no longer being met which will make the boot code nfsmount= dysfunctional  :-\ )
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #39 on: February 15, 2018, 09:51:50 PM »
The posted extension has rpcbind as a dep.

As far as I know, filesystems-KERNEL is only required by nfs-server and I don't believe that libcap and tcp_wrappers are used (I could be mistaken).

I did some basic testing on an RPi3 - please test fully and report back.

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #40 on: February 18, 2018, 10:39:01 AM »
@juanito:
AS-IS, NFS has commands hard-coded into tc-config and thus failed by simply replacing it with the one you've just built.  First, there are (expected) complaints about the calls to the now missing portmap, tcp_wrappers, etc. called in tc-config:nfsmount and then when etc/init.d/nfs-client is called, it says it's launching and then hangs for a few minutes (see time stamps):
Code: [Select]
Feb 18 02:12:59 box user.warn kernel: [  330.730775] svc: failed to register lockdv1 RPC service (errno 110).
Feb 18 02:12:59 box user.warn kernel: [  330.730816] lockd_up: makesock failed, error=-110
Feb 18 02:20:45 box user.warn kernel: [  796.650768] svc: failed to register lockdv1 RPC service (errno 110).
Feb 18 02:28:30 box user.warn kernel: [ 1262.570729] svc: failed to register lockdv1 RPC service (errno 110).
Feb 18 02:36:16 box user.warn kernel: [ 1728.490745] svc: failed to register lockdv1 RPC service (errno 110).
Feb 18 02:39:21 box local2.notice sudo:     root : TTY=console ; PWD=/ ; USER=root ; COMMAND=/bin/chown tc.staff /mnt/nfs -R
Feb 18 02:39:21 box local2.notice sudo:     root : TTY=console ; PWD=/ ; USER=root ; COMMAND=/bin/chown tc.staff /mnt/share -R

When attempting to load an NFS share manually:
root@pi-23:/mnt$ mount -t nfs 10.0.2.1:/nfs/shared share
/sbin/mount.nfs: error while loading shared libraries: libtirpc.so.3: cannot open shared object file: No such file or directory
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #41 on: February 18, 2018, 09:30:27 PM »
The updated nfs-utils depends on rpcbind, which in turn depends on libtirpc.

Perhaps you didn't update your dep files?

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #42 on: February 19, 2018, 02:03:01 AM »
That's what I thought of as well, but the first thing I did was run a search for the provider of the missing file, ran tce-ab to install it...  and it indicated it was already installed.  From the looks of things, there's merely a symlink needed between so.3 to the existing @so.1 which libtirpc installs.
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline Juanito

  • Administrator
  • Hero Member
  • *****
  • Posts: 14516
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #43 on: February 19, 2018, 02:58:04 AM »
Are you sure you don't have something unintentional in a backup or have a previous version of libtirpc loaded?

Before libtirpc is loaded there should not be any files named /usr/local/lib/libtirpc*
« Last Edit: February 19, 2018, 03:00:04 AM by Juanito »

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #44 on: February 20, 2018, 03:00:22 PM »
These are fresh remasters (for the time being, the TCZ files are being stored in /etc/embed instead of extracted into the image just to speed along testing and rebuilding - there's no MyDATA and without NFS functioning correctly, there's no persistent NFS shares for TCE/OPT, however I'll manually launch an update to the local repo here just in case there are files in limbo.

tc-config updated to accommodate both the old and the new nfs-utils/dependencies (tce.installed calls as the extensions are intended to be embedded if/when using nfsmount= boot code) in the following order:
Code: [Select]
un=`uname -r`
if [ -f /usr/local/tce.installed/libcap ]; then /usr/local/tce.installed/libcap >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/filesystems-${un} ]; then /usr/local/tce.installed/filesystems-${un} >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/tcp_wrappers ]; then /usr/local/tce.installed/tcp_wrappers >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/portmap ]; then /usr/local/tce.installed/portmap >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/libitrpc ]; then /usr/local/tce.installed/libitrpc >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/rpcbind ]; then /usr/local/tce.installed/rpcbind >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/nfs-utils ]; then /usr/local/tce.installed/nfs-utils >/dev/null 2>&1; fi
if [ ! -f /tmp/nfs_launch ]; then /usr/local/etc/init.d/nfs-client start >/dev/null 2>&1; fi
I think I have the correct order (regardless of which version of nfs-utils being used) but please feel free to leave a note if you see something out of place.
Bare in mind, the concept here is to utilize nfsmount= but at the same time support both nfs-client and nfs-server should that day ever come where the server's necessary.
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair