Tiny Core Linux

Tiny Core Base => Raspberry Pi => Topic started by: CentralWare on January 25, 2018, 04:38:03 AM

Title: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 25, 2018, 04:38:03 AM

I have a rack built with 23 RasPi cards, 8 of which are Pi3.
For the Pi2 cards, the following works flawlessly: nfsmount=10.0.2.1:/nfs/[Pi ID] opt=nfs tce=nfs
The exact same command on the Pi3 hangs the mount on boot. (These are headless, so until I open the case and connect a cable, I cannot see what it's complaining about, if anything.)

Now, on a Pi3 which I haven't done the above to yet, manually I get:
sudo mount -t nfs 10.0.2.1:/nfs/[Pi ID] /mnt/nfs hangs the session
sudo mount -o nolock -t nfs 10.0.2.1:/nfs /mnt/nfs works
* I was under the impression (old memory) nolock was in the tc-config start-up script, so this doesn't make sense to me, but I haven't yet compared internals.

Any ideas what the differences are in the start-up when using nfsmount on a Pi2 versus a Pi3?

uname -r: 4.4.39-piCore_v7+

Thanks guys!

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: patrikg on January 25, 2018, 05:10:47 AM

I think you need to read this thread. As tips and trix for nfs.

http://forum.tinycorelinux.net/index.php/topic,19913.msg123639.html#msg123639

You can as I, boot the pi3 without any sd card.
And pi1-pi2 with regular raspberry sd card boot and chain boot u-boot and chain boot to nfs.
Here some old thread with some suggestions how to do it with scripts.

http://forum.tinycorelinux.net/index.php/topic,21356.msg133578.html#msg133578

Happy coding.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 25, 2018, 05:31:14 AM

I ran a search on the forum for nfsmount before posting, just in case someone had already come across this issue. The posts you listed were in the results, but thank you.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 26, 2018, 01:49:46 PM

@bela/curaga:

Please see tc-config after

Code: [Select]

[ -z "$DHCP_RAN" ] && wait4Server $SERVER $NOPING

Within the section after nfs-client start both mount commands should have -o nolock after $MOUNT/mount.

PiCore 8.1.5/8.1.5v7 on RasPi2B/RasPi3B

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: curaga on January 26, 2018, 02:05:18 PM

That shouldn't be necessary. It's used in the busybox mount case, since without the NFS utils there is no locking handler, but if you have nfs-utils installed, locking should work properly.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 26, 2018, 02:47:25 PM

@curaga

7.x/armv6/tcz - no nfs-utils.tcz
7.x/armv7/tcz - no nfs-utils.tcz
8.x/armv6/tcz - no nfs-utils.tcz
8.x/armv7/tcz - no nfs-utils.tcz
9.x/armv6/tcz - no nfs-utils.tcz
9.x/armv7/tcz - no nfs-utils.tcz
* I didn't go any further back

Otherwise I wouldn't have even suggested the post (and/or remastered an image specific to Pi2/Pi3 on our end)

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: curaga on January 27, 2018, 04:10:58 AM

If you don't have nfs-utils installed, it's not possible for the script to go to the nfs-utils case, and mount will be called with -o nolock?

https://github.com/tinycorelinux/Core-scripts/blob/master/etc/init.d/tc-config#L359

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 27, 2018, 12:43:22 PM

@curaga,

I'm starting with a fresh image (just to ensure there are no remnants from previous alterations), but here are my results thus far.
Bare in mind I just connected a Pi2 and Pi3 to monitors thus I do not have visual results from the previous testing.

RasPi2B using nfsmount=server:path seems to work perfectly fine (with nolock) with options of tce=nfs opt=nfs
RasPi3B using the exact same image hangs; it LOOKS as though it may be DHCP related (no network available at that point) but does not give up an error.

I'm currently embedding nfs-utils from 5.x/armv6 onto an 8.x/armv7 image as I do not have the means to compile until after this Pi rack is completed; I'll also be adding more visual notes in tc-config to see if I can track down where it's hanging

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 27, 2018, 02:22:03 PM

I think I might have (in theory) figured this out. Chicken and The Egg applies.

Command Line: nfsmount=SERVER:PATH tce=nfs opt=nfs
This line in itself is self-defeating as there's no way to load nfs-utils.tcz if my theory is correct.

TC-CONFIG: assume nfs-utils is located in LOCAL:/tce/optional which isn't loaded yet.
AoE, NFS, etc. are called (nfs-utils is still not loaded at this point?)
TCEDIR is set to SERVER:PATH thus nfs-utils is then loaded AFTER nfsmount takes place IF it's found in SERVER:/path/tce/optional

...so says my educated guess at least ;)

Cure: Remaster with nfs-utils and its dependencies embedded if locking is needed; otherwise use nolock. Possibly create a backup attempt to connect manually (nolock instead of nfs-client) should the first attempt fail.

Suggestion: For all network based file systems, I'd recommend launching their connection as a background task (ie: /etc/init.d/net_nfs.sh $CMDLINE) and then just monitor for success from within tc-config with a timeout implemented allowing the system to move forward in case AoE, NFS, etc. were to fail connectivity. Add timeout=## as a boot code to override what ever the default would be.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: patrikg on January 27, 2018, 05:01:30 PM

Is the option "nolock" the correct one?
Isn't the addr= the correct option to get it working ? To the server ip.

And when i have the problem with nfs, i read lot's of error messages in the syslog.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 27, 2018, 10:37:48 PM

@patrikg

No, it hasn't been necessary to use -o addr=[ServerIP] since as far back as I've been using NFS, but odds are it's available as an option for a specific reason, just not a reason I've come across the need for personally. -o nolock IF memory serves is an addition to NFS (somewhere in NFS2 days I think) where file locking was added to help prevent Computer "A" from handling a given file which Computer "B" is already using) thus the nolock option disables file locking if the NFS Client being used doesn't support it.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: curaga on January 28, 2018, 04:01:12 AM

Gerald wrote most of the networking support. IIRC the intention was to either remaster NFS/AoE/etc utils in, or to load them via httplist/similar. With those use cases, everything necessary is on the share, and so booting without them would not be useful.

Anyway, looking forward to results on where it's hanging.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 28, 2018, 12:39:26 PM

@curaga
SO FAR, there are a few issues.

RasPi2B WITHOUT nfs-utils seems to connect to a remote NFS3/NFS4 just fine using nolock

RasPi3B WITHOUT nfs-utils hangs somewhere between wait4server and the mount command with no output

RasPi3B WITH nfs-utils (from repo 5.x/armv6)

Items in /usr/local/tce.installed aren't launched until AFTER nfsmount causing missing library issues -- corrected manually in tc-config just by launching any files found there; immediately after LO is given its IP address.
/usr/.../init.d/nfs-client launches sm-notify -q and SM complains that it doesn't know what -q is all about. I swapped with -f for the time being just to shut it up.
In 5 trial runs, if the NFS server's hard drives are sleeping, nfs-client timed out on about 20% of the hardware and seemed to hang at $MOUNT; it's approximately a 4-5 minute (or so) silent wait before it's allowed to continue which I had not given it when this thread began. The nfs share is mounted once the network connection is eventually established. (It's not waiting for input; pressing <ENTER> has no effect.)
In the same 5 trials, between 4 and 8 RasPi3 devices were launched. This rack of 23 Pi's has a dedicated 24-port GBe managed switch with a CAT-6 GBe upstream. The NFS server is a QNAP NAS with NFS3/NFS4 both enabled and the most basic share settings open to the entire network and dual GBe interfaces. It was assumed this layout should have been more than sufficient for this test, but on average, when launching more than 6 units, at least one of the units cannot hand-shake with the NFS server "fast enough" during each test, which seems to act like a no-answer to the nfs client and it just sits there for a few minutes before moving forward and failing the connection.

* SYSLOG is being added to the mix for further details now that I know I can eventually get into the device after it "hangs"

My next set of tests are Static IP based (preventing the wait time I'm hoping) followed by rearranging tc-config and placing all of the AoE/NFS/etc mounts further down the stream giving the hardware a little more time to settle before relying on networking

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: Juanito on January 29, 2018, 05:33:46 AM

does the same thing happen with RPi3 using the 9.x/armv7 repo?

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 29, 2018, 01:10:30 PM

@juanito
I'm about a day away from having 8.x tested and remastered to accommodate some of the issues at hand; then I'll test 9.x (though using nfs-utils from way-back since I won't have the option to compile.)

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 29, 2018, 03:50:08 PM

* NOTE: This is a remaster, nfsmount and aoe loading have been commented out and relocated to separate files just before launching bootsync so things can be modularized. Changes made will be implemented back into the core image once complete.

Hang Potential 1: /usr/local/sbin/mount.nfs4
Hang Potential 2: Ping/Bcast checking while the NIC is still settling

I replaced MOUNT4=/usr/local.... with MOUNT4=mount for the time being just to get passed the situation.
Once the device booted, the mount was successful. After running a find -name mount.* I was unable to locate mount.nfs4 anywhere on the system.
After a quick Google, NFS4 is slated in 6.2 release; so I'm just guessing mount.nfs4 may have been a link to /usr/local/sbin/mount.nfs which may have gotten lost in the repo.

* When told to reboot, if the NFS share is still mounted to /mnt/nfs, something causes the system to hang after shutdown attempts to dismount (black screen) - cold boot required.
Adding a conditional umount /mnt/nfs before zswap tends to the situation in rc.shutdown

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 30, 2018, 03:39:30 AM

Okay... this was fun!

Final results:

nfs-utils (only version available found in 5.x/armv6) needs to have its dependency updated to include libcap AND if embedded, obviously the associated files found in tce.installed need to be launched
uDHCPc, called via init.d/dhcp.sh, is not guaranteed. I have cell phones and wireless dongles getting a broadcast response from the DHCP server in two seconds or less; PiCore8 averages 20-35 seconds on RasPi3B
wait4net has been rewritten to double-check itself; if there's no ETH0 within the timeout, call itself again - for headless/dhcp, this is vital
If DHCP fails (and there's no ETH0 by the time the mount is called) mount.nfs will hang without complaining. I would recommend rearranging things a bit by having the mount launched in the background and a timeout loop set up to watch for success; if it fails, kill the mount thread and move forward and/or try again?

I was thinking... maybe it might take a lot of time off the DHCP broadcast if we force-fed a dummy IP into eth0?
It's possible TC 9.x might have a better NIC driver or what-not which makes all of the above play together properly; this I do not yet know.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: curaga on January 30, 2018, 04:38:05 AM

Okay, so it's not NFS per se that fails, but DHCP. Try to watch the network stream, to see where it's failing - is it not sending the request, or is the server not replying?

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: patrikg on January 30, 2018, 04:48:41 AM

Maybe you could see the difference between the phone dhcp request package and the rasppi dhcp request package.
Maybe there some options not included and that slows downs the dhcp server dhcp ack package and server response.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on January 30, 2018, 03:52:38 PM

DHCP CLIENT : RasPi side

Code: [Select]

Jan  1 00:00:02 box daemon.err udhcpc[337]: started, v1.26.2
Jan  1 00:00:02 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:04 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:07 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:09 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:11 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:13 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:15 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:17 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:19 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:21 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:23 box daemon.err udhcpc[337]: no lease, forking to background
Jan  1 00:00:28 box daemon.err udhcpc[758]: sending discover
Jan  1 00:00:30 box daemon.err udhcpc[758]: sending discover
Jan  1 00:00:32 box daemon.err udhcpc[758]: sending discover
Jan  1 00:00:34 box daemon.err udhcpc[758]: sending discover
Jan  1 00:00:34 box daemon.err udhcpc[758]: sending select for 10.10.20.1
Jan  1 00:00:34 box daemon.err udhcpc[758]: lease of 10.10.20.1 obtained, lease time 21600

DHCP SERVER

Code: [Select]

Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPDISCOVER(eth0) b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPOFFER(eth0) 10.10.20.1 b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPDISCOVER(eth1) b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPOFFER(eth1) 10.10.20.1 b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPREQUEST(eth0) 10.10.20.1 b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPACK(eth0) 10.10.20.1 b8:27:eb:09:2e:4e pi-20

On-screen, the broadcast isn't received by the server until after the third pass (waiting for ETH0 to settle and come back with Bcast in ifconfig)

Speculation: Neither DHCP nor NFS related possibly; it's looking more like ETH0 itself is dragging its feet (30+ seconds!) before the broadcast is even sent.
The broadcast itself is answered in less than 1sec once it's received by the server. :-\ Very odd. Any history on Pi3 kernel driver issues?

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: patrikg on January 30, 2018, 04:33:58 PM

Some parameters you can play with
https://www.raspberrypi.org/documentation/configuration/cmdline-txt.md

And i also think you already read this, but maybe there some tips you can get.
And also there some bits in the firmware in the rpi3 to be set if you want to boot from usb/network.
https://www.raspberrypi.org/documentation/hardware/raspberrypi/bootmodes/net_tutorial.md

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: patrikg on January 31, 2018, 03:14:56 AM

When i sleep tonight, i get some thoughts about this.

Maybe the dhcp client not have some hostname specified.
I think when the dhcp request from the client it's sends the hostname to the server.
I don't know, but you maybe can declare some hostname in the dhcp client (udhcpc).

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 01, 2018, 03:47:37 AM

Updated Results with a clean install of 9.x/arm7:

Networking begins at 00:00:07 (on average)
ifconfig is set to a static 10.x.x.x/255.0.0.0 address taking DHCP out of the loop
Bcast/adrs show up almost immediately after ifconfig eth0
Attempting to ping an internal server is successful at 00:00:18
Attempting to ping an external (web) IP (direct, no dns) is finally successful at 00:00:33

*ntpdate is then run to set the clock; which takes an additional 14 seconds for some reason, but I cannot complain that far along until I figure out why we go from LINK to ACTIVE after 11 seconds and then LINK to INTERNET (routed) access after ~25 seconds. The ethernet switch isn't showing errors, almost everything is disabled (flood prevention, spanning, etc.) for debugging purposes... ideas at this point are far/few between.

The ping test itself is nothing more than

Code: [Select]

until ping -c 1 $SERVER >/dev/null 2>&1
CNT=0
do [ $((CNT++)) -gt 60 ] && break || sleep 1
done

/var/log/messages indicates

Code: [Select]

Jan  1 00:00:07 box user.info kernel: [    7.978431] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Jan  1 00:00:08 box local2.notice sudo:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/sh /opt/tc-netwait (our first ping test starts)
Jan  1 00:00:09 box user.info kernel: [    9.492911] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
Jan  1 00:00:11 box user.notice kernel: [   11.435848] random: crng init done
Jan  1 00:00:18 box local2.notice sudo:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/sh /opt/tc-netwait2 (our second ping test starts)

followed by a large number of
Feb 1 08:32:26 box auth.err getty[1071]: tcgetattr: Input/output error^M every 10 seconds

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: curaga on February 01, 2018, 04:48:04 AM

I think the getty error is about the serial console not being connected, check inittab.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 01, 2018, 05:22:23 AM

@curaga: I'm at a bit of a loss with the getty issue; I replaced TTY1 line with respawn:/sbin/getty 38400 tty1 to skip the autologin thinking since my tc and root accounts both have passwords, there's likely something, somewhere, trying to do something that's password-less or what-not... syslog still shows autologin being launched, inittab shows it was disabled.

The process startserialtty is initiating getty; since I have no need for serial console output, killing it isn't a problem... error resolved! :D

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: patrikg on February 01, 2018, 05:45:14 AM

Have you seen this threads:

http://forum.tinycorelinux.net/index.php/topic,20701.msg129231.html#msg129231

http://forum.tinycorelinux.net/index.php/topic,21129.0.html

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 01, 2018, 10:32:09 AM

@patrikg: Simply removing console= from the boot codes removes the error IF you don't need serial console messages; the TXT files are different for Pi3 and Pi2 and older thus to make it work properly it's likely just a matter of knowing and entering the correct output and speed

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 02, 2018, 02:02:31 AM

Gooood Morning, Tiny Core!

After spending almost a week and a half trying to rip things apart and put them back together again under the RasPi TCL hood I think I've finally hit a brick wall where I'm virtually out of time, SO the only real "cure" for my own needs would be to dissect tc-config (break it into separate files, inject things we need to happen during the startup process and embed a couple extension such as dropbear and leave persistence and network drives/shares for after tc-config has been completed since hardware versus kernel/startup does not seem to sync well (the hardware isn't online by the time we're loading kernel drivers and thus nfs/aoe/etc. are bound to be unstable at best.)

@curaga: In theory let's say dropbear was an extension loaded on startup from tcedir (ie: mmcblk0p1/tce/optional) and then after tc-config was completely loaded, another script replaced tcedir/tce.installed/etc with one on a network share and those extensions were loaded via tce-load -i... do you foresee any potential glitches with the mount of dropbear? (I am speculating that the squash'ed extensions are mounted in memory (/tmp/tcloop) where the content is already "there" and the physical TCZ no longer "needed" -- but you guys know the internal operations of TCe much better than I SO... figure it's best to just ask! :) )

* I have a method I'm going to try first which would prevent the need to mount over the existing tce/tcedir/etc. but it may turn out to be too bandwidth extensive with a few dozen Pi's hammering an in-house TCL repo all at once; which ever can handle lags, etc. would be the direction chosen.

Thank you guys for lending an ear over the past couple weeks; there's no cure I'm aware of thus far (driver or hardware issue being the accused; a RealTec USB NIC lit up and was online in about 5 seconds as opposed to the onboard NIC which averages 27-34 seconds and I hadn't had time to test the onboard WiFi) but it's nice to know we can pick one another's minds for clues or opinions when problems like this arise!

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: curaga on February 02, 2018, 03:59:59 AM

Why not add a wait ping loop before the nfs/dhcp parts? You said that external ping correlated with NFS success. That's the simplest modification IMHO.

A bind mount over the tce dir shouldn't cause any trouble, but a missing entry in tce.installed would mean the utils don't know it's there, and try to fetch/load it again.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 02, 2018, 04:18:38 AM

Quote

Why not add a wait ping loop before the nfs/dhcp parts?

It takes roughly 30 seconds just for the network (dhcp OR static) to launch and successfully ping. Within the same period of time while we're waiting for the NIC to pull up, we could have loaded everything else (ie: onboard busybox-httpd service, gpio, etc.) and have everything done that's possible before the network is even accessible.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 14, 2018, 05:36:07 AM

@curaga: Please update 5.x/armv6/nfs-utils.tcz with the following error correction:

/usr/local/etc/init.d/nfs-client Line 11

Code: [Select]

     /usr/local/sbin/sm-notify -q

with

Code: [Select]

     /usr/local/sbin/sm-notify -f

/usr/local/sbin/sm-notify: invalid option -- 'q'

https://linux.die.net/man/8/sm-notify (https://linux.die.net/man/8/sm-notify)

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: curaga on February 14, 2018, 01:19:02 PM

Sorry, I don't use ARM.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 14, 2018, 02:56:54 PM

Sorry, I don't use ARM. <-- The only barriers that can stop you are the ones you create yourself. <-- :P

Actually, this is a universal TCL issue (x86/x64/etc.) and there's a hanging problem across distros when shutting down (where mounts fail dismounting for numerous reasons and hang up at either SYNC or UMOUNT -A due to a known issue in nfs-utils... though I think I've come across an easy-enough fix, at least within rc.shutdown, which thus far has been working like a charm but in order to make rc.shutdown more stable with remote mounts (cifs, nfs, AoE, etc.) it would require a little reworking of /etc/init.d/rc.shutdown and a few self-checks implemented with the following logic:

*Attempt to WRITE to a dummy file on each remote mount - many of them may have sleeping hard drives, delete the file afterward

First, SYNC and then scan 'mount' for remote mounts and dismount each manually, NOT with umount -a
Next, SYNC and then kill running processes as some of these may leave the above rule open with file locks
Repeat the first step if any mounts are found left within the list from 'mount'
At this point, we've exhausted logistics, force a dismount of remote mounts and allow rc.shutdown to continue

In most cases/distros, when umount -a fails/hangs on a remote mount, there's a two minute timeout which is a while to wait, but not forever... but it still leaves the possibility for unwritten data and/or other similar means for corruption. The above logic helps with this a bit, but with remote mounts there's nothing I can think of which covers 100%.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 15, 2018, 12:14:37 AM

APPLIES TO ALL ARCHITECTURES

Combination to prevent remote shares from hanging up during shut-down seems complete.
This has been tested in a LAMP environment where all services were operating during the shutdown/reboot request.
Bare in mind, I left a few notes here and there for visual aid while testing; the scripts could use a bath and minor efficiency rewrite.

In /etc/init.d/tc-config immediately before extensions are loaded, add:

Code: [Select]

## Find the largest PID and save it in /tmp/pid_start ##
maxpid=0
ps -a | grep -v USER | awk '{print $1}' > /tmp/pidlist
while read -r item
do
    if [ $item -gt $maxpid ]; then maxpid=$item; fi
done < /tmp/pidlist
echo $maxpid > /tmp/pid_start
rm /tmp/pidlist

This allows us to get an idea of what processes existed PRIOR to extensions being loaded.

Create /etc/init.d/rc.shutdown-shares and launch it as the first command within rc.shutdown:

Code: [Select]

#!/bin/busybox ash
. /etc/init.d/tc-functions
useBusybox

# 1. First, let's TRY to kill all processes NEWER than /tmp/pid_start
pid=`cat /tmp/pid_start`
ps -a | awk '{print $1}' > /tmp/pidlist
while read -r line
do
    if [ $line -gt $pid ]; then kill $line >/dev/null 2>&1; fi
done < /tmp/pidlist

# 2. Next, we need to dismount items in /tmp/tcloop
cd /tmp/tcloop
for dir in *
do
    umount $dir >/dev/null 2>&1
    rmdir $dir >/dev/null
done

# 3. Let's check to see if we have any strays
cd /tmp/tcloop
for dir in *
do
    echo "UMOUNT: Failed dismounting $dir"
done

# 4. Let's now try and dismount any of our NFS mounts after writing
#     dummy info onto each (in case they're asleep)
test=`mount | grep " type nfs " | awk '{print $3}'`
if [ ! "${test}" == "" ]; then
for item in $test
do
    echo "NFS: Write Testing [${item}]"
    sudo touch $item/tcl_nfs_test.txt >/dev/null 2>&1
    usleep 200000
    if [ ! -f $item/tcl_nfs_test.txt ]; then
        echo "  ${RED}Error writing to NFS mount $item${NORMAL}"
    fi
    sudo rm $item/tcl_nfs_test.txt -f >/dev/null 2>&1

    echo "NFS: Dismounting [${item}]"
    sudo umount $item
    tests=`mount | grep "${item}"`
    if [ ! "${tests}" == "" ]; then
        echo "  ${RED}Error dismounting $item${NORMAL}"
    fi
done
fi

This disconnects the shares after killing off any processes which were launched prior to extensions loading.
This is especially vital when mounting an NFS share and then persisting TCE, OPT, etc. onto that share.

I have not proven my theory yet, but I have reason to believe, based on the complaints regarding nfs-utils I've read, there's a good chance umount -a doesn't play nice with mount.nfs and based on the order in which rc.shutdown starts off, we may be asking NFS to dismount prior to file handles being released (guaranteed for /tmp/tcloop), or for all I know we could be dismounting extensions (including nfs-utils??) prior to NFS shares being let go which I'm guessing could also cause issues.

When TCL is running with nfsmount=server:/share and tce=nfs, without the above mods rc.shutdown will hang at the point it tries to umount -a for a few minutes, with the above there's virtually no delay at all. All other network shares, especially those where TCL is allowed to persist onto the share, need to be treated in the same fashion to prevent lags/hanging as well as help prevent data loss/corruption.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: Juanito on February 15, 2018, 03:07:14 AM

Quote from: centralware on February 14, 2018, 05:36:07 AM

Please update 5.x/armv6/nfs-utils.tcz

nfs-utils updated to the same version as core/corepure64 and posted to the picore 9.x repos

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: curaga on February 15, 2018, 06:03:49 AM

PID numbers get recycled, and killing processes doesn't prevent data loss, so those risks still exist.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 15, 2018, 11:43:09 AM

@juanito: Thanks! I cannot speak for x86/x64 as I haven't repeated this process there yet, but it may be prudent to add libcap.tcz to nfs-util's dependency list.

@curaga: YES, PIDs are reused, the concept here is to TRY and avoid start-up processes (take DHCPc for example) from getting shut down TOO early (we don't want anything interrupting networking while attempting to disconnect from our shares in this case) so setting a land-mark (ie: don't kill off anything prior to when our extensions were being loaded) sounded like a potentially safe bet for processes which don't reinitialize themselves. TCL may be a little too mature to begin a new standard, such as implementing init's for daemons... but who knows! :)

I'm planning on putting together an add-in for rc.shutdown which scans init.d directories and calls scriptname stop for launch scripts it can find and then go against popular apps (such as apache, sql, etc.) and if running, call their associated binaries to shut down naturally (as opposed to SIGs) to assist with some of the killings and potential data issues. Considering the lack of standardized init scripts within many of the extensions, it's the easiest way I can think of (and an ongoing list over time).

For my needs (which are uncommon versus the community's usual use for TC) I have a "profile" set up for website design, for example, which is a general LAMP configuration and a hand-full of extensions (ionCube, DAV, SVN, etc.) which I'm going to package as my_lamp.tcz which will include init's and other basic content to simplify what's needed here; you're welcome to the init's when they're done if you think they'd be useful.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 15, 2018, 04:17:47 PM

@juanito: Please update the dependency list below (based on memory)

nfs-utils.tcz.dep

portmap.tcz
libcap.tcz
tcp_wrappers.tcz
filesystems-KERNEL.tcz

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 15, 2018, 10:34:26 PM

nfs-utils.tcz Version 1.3.3 I have not read up on as of yet, but from the .info notes, rpcbind.tcz is a replacement for portmap; if this has rpc.statd in the same locations and the likes, please replace portmap.tcz from the dependency listing accordingly (otherwise there are going to be needs in tc-config no longer being met which will make the boot code nfsmount= dysfunctional :-\ )

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: Juanito on February 16, 2018, 12:51:50 AM

The posted extension has rpcbind as a dep.

As far as I know, filesystems-KERNEL is only required by nfs-server and I don't believe that libcap and tcp_wrappers are used (I could be mistaken).

I did some basic testing on an RPi3 - please test fully and report back.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 18, 2018, 01:39:01 PM

@juanito:
AS-IS, NFS has commands hard-coded into tc-config and thus failed by simply replacing it with the one you've just built. First, there are (expected) complaints about the calls to the now missing portmap, tcp_wrappers, etc. called in tc-config:nfsmount and then when etc/init.d/nfs-client is called, it says it's launching and then hangs for a few minutes (see time stamps):

Code: [Select]

Feb 18 02:12:59 box user.warn kernel: [  330.730775] svc: failed to register lockdv1 RPC service (errno 110).
Feb 18 02:12:59 box user.warn kernel: [  330.730816] lockd_up: makesock failed, error=-110
Feb 18 02:20:45 box user.warn kernel: [  796.650768] svc: failed to register lockdv1 RPC service (errno 110).
Feb 18 02:28:30 box user.warn kernel: [ 1262.570729] svc: failed to register lockdv1 RPC service (errno 110).
Feb 18 02:36:16 box user.warn kernel: [ 1728.490745] svc: failed to register lockdv1 RPC service (errno 110).
Feb 18 02:39:21 box local2.notice sudo:     root : TTY=console ; PWD=/ ; USER=root ; COMMAND=/bin/chown tc.staff /mnt/nfs -R
Feb 18 02:39:21 box local2.notice sudo:     root : TTY=console ; PWD=/ ; USER=root ; COMMAND=/bin/chown tc.staff /mnt/share -R

When attempting to load an NFS share manually:
root@pi-23:/mnt$ mount -t nfs 10.0.2.1:/nfs/shared share
/sbin/mount.nfs: error while loading shared libraries: libtirpc.so.3: cannot open shared object file: No such file or directory

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: Juanito on February 19, 2018, 12:30:27 AM

The updated nfs-utils depends on rpcbind, which in turn depends on libtirpc.

Perhaps you didn't update your dep files?

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 19, 2018, 05:03:01 AM

That's what I thought of as well, but the first thing I did was run a search for the provider of the missing file, ran tce-ab to install it... and it indicated it was already installed. From the looks of things, there's merely a symlink needed between so.3 to the existing @so.1 which libtirpc installs.

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: Juanito on February 19, 2018, 05:58:04 AM

Are you sure you don't have something unintentional in a backup or have a previous version of libtirpc loaded?

Before libtirpc is loaded there should not be any files named /usr/local/lib/libtirpc*

Title: Re: PiCore, RasPi3B and boot code nfsmount
Post by: CentralWare on February 20, 2018, 06:00:22 PM

These are fresh remasters (for the time being, the TCZ files are being stored in /etc/embed instead of extracted into the image just to speed along testing and rebuilding - there's no MyDATA and without NFS functioning correctly, there's no persistent NFS shares for TCE/OPT, however I'll manually launch an update to the local repo here just in case there are files in limbo.

tc-config updated to accommodate both the old and the new nfs-utils/dependencies (tce.installed calls as the extensions are intended to be embedded if/when using nfsmount= boot code) in the following order:

Code: [Select]

un=`uname -r`
if [ -f /usr/local/tce.installed/libcap ]; then /usr/local/tce.installed/libcap >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/filesystems-${un} ]; then /usr/local/tce.installed/filesystems-${un} >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/tcp_wrappers ]; then /usr/local/tce.installed/tcp_wrappers >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/portmap ]; then /usr/local/tce.installed/portmap >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/libitrpc ]; then /usr/local/tce.installed/libitrpc >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/rpcbind ]; then /usr/local/tce.installed/rpcbind >/dev/null 2>&1; fi
if [ -f /usr/local/tce.installed/nfs-utils ]; then /usr/local/tce.installed/nfs-utils >/dev/null 2>&1; fi
if [ ! -f /tmp/nfs_launch ]; then /usr/local/etc/init.d/nfs-client start >/dev/null 2>&1; fi

I think I have the correct order (regardless of which version of nfs-utils being used) but please feel free to leave a note if you see something out of place.
Bare in mind, the concept here is to utilize nfsmount= but at the same time support both nfs-client and nfs-server should that day ever come where the server's necessary.