WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: PiCore, RasPi3B and boot code nfsmount  (Read 8504 times)

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #15 on: January 29, 2018, 12:50:08 PM »
* NOTE: This is a remaster, nfsmount and aoe loading have been commented out and relocated to separate files just before launching bootsync so things can be modularized. Changes made will be implemented back into the core image once complete.

Hang Potential 1: /usr/local/sbin/mount.nfs4
Hang Potential 2: Ping/Bcast checking while the NIC is still settling

I replaced MOUNT4=/usr/local....  with MOUNT4=mount for the time being just to get passed the situation.
Once the device booted, the mount was successful.  After running a find -name mount.* I was unable to locate mount.nfs4 anywhere on the system.
After a quick Google, NFS4 is slated in 6.2 release; so I'm just guessing mount.nfs4 may have been a link to /usr/local/sbin/mount.nfs which may have gotten lost in the repo.

* When told to reboot, if the NFS share is still mounted to /mnt/nfs, something causes the system to hang after shutdown attempts to dismount (black screen) - cold boot required.
   Adding a conditional umount /mnt/nfs before zswap tends to the situation in rc.shutdown

Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #16 on: January 30, 2018, 12:39:30 AM »
Okay...  this was fun!

Final results:
  • nfs-utils (only version available found in 5.x/armv6) needs to have its dependency updated to include libcap AND if embedded, obviously the associated files found in tce.installed need to be launched
  • uDHCPc, called via init.d/dhcp.sh, is not guaranteed. I have cell phones and wireless dongles getting a broadcast response from the DHCP server in two seconds or less; PiCore8 averages 20-35 seconds on RasPi3B
  • wait4net has been rewritten to double-check itself; if there's no ETH0 within the timeout, call itself again - for headless/dhcp, this is vital
  • If DHCP fails (and there's no ETH0 by the time the mount is called) mount.nfs will hang without complaining.  I would recommend rearranging things a bit by having the mount launched in the background and a timeout loop set up to watch for success; if it fails, kill the mount thread and move forward and/or try again?
  • I was thinking...  maybe it might take a lot of time off the DHCP broadcast if we force-fed a dummy IP into eth0?
  • It's possible TC 9.x might have a better NIC driver or what-not which makes all of the above play together properly; this I do not yet know.

Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 10957
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #17 on: January 30, 2018, 01:38:05 AM »
Okay, so it's not NFS per se that fails, but DHCP. Try to watch the network stream, to see where it's failing - is it not sending the request, or is the server not replying?
The only barriers that can stop you are the ones you create yourself.

Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 662
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #18 on: January 30, 2018, 01:48:41 AM »
Maybe you could see the difference between the phone dhcp request package and the rasppi dhcp request package.
Maybe there some options not included and that slows downs the dhcp server dhcp ack package and server response.



Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #19 on: January 30, 2018, 12:52:38 PM »
DHCP CLIENT : RasPi side
Code: [Select]
Jan  1 00:00:02 box daemon.err udhcpc[337]: started, v1.26.2
Jan  1 00:00:02 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:04 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:07 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:09 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:11 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:13 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:15 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:17 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:19 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:21 box daemon.err udhcpc[337]: sending discover
Jan  1 00:00:23 box daemon.err udhcpc[337]: no lease, forking to background
Jan  1 00:00:28 box daemon.err udhcpc[758]: sending discover
Jan  1 00:00:30 box daemon.err udhcpc[758]: sending discover
Jan  1 00:00:32 box daemon.err udhcpc[758]: sending discover
Jan  1 00:00:34 box daemon.err udhcpc[758]: sending discover
Jan  1 00:00:34 box daemon.err udhcpc[758]: sending select for 10.10.20.1
Jan  1 00:00:34 box daemon.err udhcpc[758]: lease of 10.10.20.1 obtained, lease time 21600

DHCP SERVER
Code: [Select]
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPDISCOVER(eth0) b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPOFFER(eth0) 10.10.20.1 b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPDISCOVER(eth1) b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPOFFER(eth1) 10.10.20.1 b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPREQUEST(eth0) 10.10.20.1 b8:27:eb:09:2e:4e
Jan 30 15:42:31 dnsmasq-dhcp[22290]: DHCPACK(eth0) 10.10.20.1 b8:27:eb:09:2e:4e pi-20

On-screen, the broadcast isn't received by the server until after the third pass (waiting for ETH0 to settle and come back with Bcast in ifconfig)

Speculation: Neither DHCP nor NFS related possibly; it's looking more like ETH0 itself is dragging its feet (30+ seconds!) before the broadcast is even sent.
The broadcast itself is answered in less than 1sec once it's received by the server.  :-\ Very odd.  Any history on Pi3 kernel driver issues?

Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 662
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #20 on: January 30, 2018, 01:33:58 PM »
Some parameters you can play with
https://www.raspberrypi.org/documentation/configuration/cmdline-txt.md

And i also think you already read this, but maybe there some tips you can get.
And also there some bits in the firmware in the rpi3 to be set if you want to boot from usb/network.
https://www.raspberrypi.org/documentation/hardware/raspberrypi/bootmodes/net_tutorial.md



Offline patrikg

  • Wiki Author
  • Hero Member
  • *****
  • Posts: 662
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #21 on: January 31, 2018, 12:14:56 AM »
When i sleep tonight, i get some thoughts about this.

Maybe the dhcp client not have some hostname specified.
I think when the dhcp request from the client it's sends the hostname to the server.
I don't know, but you maybe can declare some hostname in the dhcp client (udhcpc).
 

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #22 on: February 01, 2018, 12:47:37 AM »
Updated Results with a clean install of 9.x/arm7:
  • Networking begins at 00:00:07 (on average)
  • ifconfig is set to a static 10.x.x.x/255.0.0.0 address taking DHCP out of the loop
  • Bcast/adrs show up almost immediately after ifconfig eth0
  • Attempting to ping an internal server is successful at 00:00:18
  • Attempting to ping an external (web) IP (direct, no dns) is finally successful at 00:00:33
*ntpdate is then run to set the clock; which takes an additional 14 seconds for some reason, but I cannot complain that far along until I figure out why we go from LINK to ACTIVE after 11 seconds and then LINK to INTERNET (routed) access after ~25 seconds.  The ethernet switch isn't showing errors, almost everything is disabled (flood prevention, spanning, etc.) for debugging purposes...  ideas at this point are far/few between.

The ping test itself is nothing more than
Code: [Select]
until ping -c 1 $SERVER >/dev/null 2>&1
CNT=0
do [ $((CNT++)) -gt 60 ] && break || sleep 1
done

/var/log/messages indicates
Code: [Select]
Jan  1 00:00:07 box user.info kernel: [    7.978431] smsc95xx 1-1.1:1.0 eth0: hardware isn't capable of remote wakeup
Jan  1 00:00:08 box local2.notice sudo:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/sh /opt/tc-netwait (our first ping test starts)
Jan  1 00:00:09 box user.info kernel: [    9.492911] smsc95xx 1-1.1:1.0 eth0: link up, 100Mbps, full-duplex, lpa 0xC1E1
Jan  1 00:00:11 box user.notice kernel: [   11.435848] random: crng init done
Jan  1 00:00:18 box local2.notice sudo:     root : TTY=unknown ; PWD=/ ; USER=root ; COMMAND=/bin/sh /opt/tc-netwait2 (our second ping test starts)
followed by a large number of
Feb  1 08:32:26 box auth.err getty[1071]: tcgetattr: Input/output error^M every 10 seconds

« Last Edit: February 01, 2018, 12:50:50 AM by centralware »
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 10957
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #23 on: February 01, 2018, 01:48:04 AM »
I think the getty error is about the serial console not being connected, check inittab.
The only barriers that can stop you are the ones you create yourself.

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #24 on: February 01, 2018, 02:22:23 AM »
@curaga: I'm at a bit of a loss with the getty issue; I replaced TTY1 line with respawn:/sbin/getty 38400 tty1 to skip the autologin thinking since my tc and root accounts both have passwords, there's likely something, somewhere, trying to do something that's password-less or what-not...  syslog still shows autologin being launched, inittab shows it was disabled.

The process startserialtty is initiating getty; since I have no need for serial console output, killing it isn't a problem...  error resolved!  :D
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair


Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #26 on: February 01, 2018, 07:32:09 AM »
@patrikg: Simply removing console= from the boot codes removes the error IF you don't need serial console messages; the TXT files are different for Pi3 and Pi2 and older thus to make it work properly it's likely just a matter of knowing and entering the correct output and speed
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #27 on: February 01, 2018, 11:02:31 PM »
Gooood Morning, Tiny Core!

After spending almost a week and a half trying to rip things apart and put them back together again under the RasPi TCL hood I think I've finally hit a brick wall where I'm virtually out of time, SO the only real "cure" for my own needs would be to dissect tc-config (break it into separate files, inject things we need to happen during the startup process and embed a couple extension such as dropbear and leave persistence and network drives/shares for after tc-config has been completed since hardware versus kernel/startup does not seem to sync well (the hardware isn't online by the time we're loading kernel drivers and thus nfs/aoe/etc. are bound to be unstable at best.)

@curaga: In theory let's say dropbear was an extension loaded on startup from tcedir (ie: mmcblk0p1/tce/optional) and then after tc-config was completely loaded, another script replaced tcedir/tce.installed/etc with one on a network share and those extensions were loaded via tce-load -i...  do you foresee any potential glitches with the mount of dropbear?  (I am speculating that the squash'ed extensions are mounted in memory (/tmp/tcloop) where the content is already "there" and the physical TCZ no longer "needed" -- but you guys know the internal operations of TCe much better than I SO... figure it's best to just ask! :) )

* I have a method I'm going to try first which would prevent the need to mount over the existing tce/tcedir/etc. but it may turn out to be too bandwidth extensive with a few dozen Pi's hammering an in-house TCL repo all at once; which ever can handle lags, etc. would be the direction chosen.

Thank you guys for lending an ear over the past couple weeks; there's no cure I'm aware of thus far (driver or hardware issue being the accused; a RealTec USB NIC lit up and was online in about 5 seconds as opposed to the onboard NIC which averages 27-34 seconds and I hadn't had time to test the onboard WiFi) but it's nice to know we can pick one another's minds for clues or opinions when problems like this arise!
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 10957
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #28 on: February 02, 2018, 12:59:59 AM »
Why not add a wait ping loop before the nfs/dhcp parts? You said that external ping correlated with NFS success. That's the simplest modification IMHO.

A bind mount over the tce dir shouldn't cause any trouble, but a missing entry in tce.installed would mean the utils don't know it's there, and try to fetch/load it again.
The only barriers that can stop you are the ones you create yourself.

Offline CentralWare

  • Administrator
  • Hero Member
  • *****
  • Posts: 1652
Re: PiCore, RasPi3B and boot code nfsmount
« Reply #29 on: February 02, 2018, 01:18:38 AM »
Quote
Why not add a wait ping loop before the nfs/dhcp parts?
It takes roughly 30 seconds just for the network (dhcp OR static) to launch and successfully ping.  Within the same period of time while we're waiting for the NIC to pull up, we could have loaded everything else (ie: onboard busybox-httpd service, gpio, etc.) and have everything done that's possible before the network is even accessible.
Over 90% of all computer problems can be traced back to the interface between the keyboard and the chair