WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: Re: Tiny Core v17.0 upgrade issues  (Read 7534 times)

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #90 on: April 26, 2026, 02:14:36 AM »
small update:
- still running after 48 hour.

This is the first time I have more than 34 hours "full functional on TC17".
- on the one hand this is a stromg indication that select() is the root cause
- on the other hand "not using select() and doing nonblocking read" is a perfect fix for my application

I keep it running for 7 days, until next Friday. If it did not crash by than I call it "demonstrated OS/kext rootcause" and "demonstrated application fix/workaround".

So: No posts on this thread for coming 5 days unless there is a crash.

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #91 on: April 27, 2026, 03:30:27 AM »
3x 24hr = 72hr crashfree continuous run and counting..

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #92 on: April 28, 2026, 02:21:59 AM »
4x 24hr = 96hr crashfree continuous run and counting..

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #93 on: April 29, 2026, 02:15:25 AM »
5x 24hr = 120hr crashfree continuous run and counting..

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #94 on: April 30, 2026, 03:12:04 AM »
6x 24hr = 144hr crashfree continuous run and counting..

Tomorrow around this time I will stop it manual if it still did not crash.
A full week of crash free run I would consider "demonstrated".
That will give me at least a "fix/workaround to run my application under TC17".

I will however do a bit more work to zoom-in on rootcause...

I will get the screensaver deactivated by the instructions from @Rich
Than I will restart with select() back in but read() still on non-blocking (the select() will than not have any function but just be there to test whether it hurts).

The difference between original program and currently running version is not only that select() is no longer called, the read() is now called in non-blocking mode in stead of "select() gated blocking mode". So.... rootcause could be select() but it could also be that "blocking read() call" blocks forever even if there is no reason to do so. This would actually connect to the fact that zero logging gets produced at a crash. "The system just locks".
I'm running on a single core cpu. So if this single core gets stuck there is nothing else to take over (not sure it works like that, but it is at least a fact).

Current version of the application is made configurable on this by a user setting so I can run this configuration change without need for recompile. That is a benefit. This brings the amount of changes to an absolute minimum.

So....
- Any advice from a different timezone on "what to do with next run" is welcome.
- Again... although I'm logging rsyslog towards a network connected second computer, I get zero logging from rsyslog with a crash, even though I'm logging at kern.* level. Any advice to get more logging would also be welcome.

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #95 on: May 01, 2026, 01:54:40 AM »
7x 24hr (-1hr, started early today) = 167hr crashfree continuous run
Good!

I just stopped the application manually.

I restarted the application with select(). The read() is still in nonblocking mode.
That means that the select() does not have a function. Its only there to see whether it crashes.
As said yesterday I originally planned to disable the screensaver to allow my monitor to show logging. But... I decide not to do that. I have earlier seen that addition changes bring different behavior.
I now did zero changes. I only started the application with a different config-file.

So... if the select() is the crash-cause it should crash within 24hr, likely around 15hr
If it does not crash the "non blocking nature" of the read() is probably the crash-cause.
fingers crossed again...

Recap of results so far:

TC15:
- including kext: usb-serial-6.6.8-tinycore.tcz
- recompile full application
- run application with "every second 812byte read from /dev/ttyUSB0; baudrate 11520"
- no crash over multiple month

TC17:
- including kext: usb-serial-6.18.2-tinycore.tcz
- recompile full application
- run application with "every second 812byte read from /dev/ttyUSB0; baudrate 11520"
- 4..6x tested, always crashes, mostly after about 15hrs; always within 24hrs

TC17:
- including kext: usb-serial-6.18.2-tinycore.tcz
- SAME application, NO recompile
- run application with disabled read from /dev/ttyUSB0 by configuration constant
- 1x tested, NO CRASH after 30hrs >> I thought this was good but as next config crashed after 33.5hrs I may not have been running long enough

TC17:
- including kext: usb-serial-6.18.2-tinycore.tcz
- not use select()
- use read() in non-blocking mode
- recompile full application
- run application with "every second 812byte read from /dev/ttyUSB0; baudrate 11520"
- 1x tested, no crash, manually stopped after 7 days, 167hr.
SO:
- using the read() in non-blocking mode has all functionality I need, so to my application this is a very acceptable fix.
- Rootcasue for crash is likely either the select() OR the blocking nature of the read(), I keep zooming in to find out.
« Last Edit: May 01, 2026, 01:58:35 AM by Stefann »

Offline gadget42

  • Hero Member
  • *****
  • Posts: 1035
Re: Tiny Core v17.0 upgrade issues
« Reply #96 on: May 01, 2026, 03:53:14 PM »
thanks for taking the time to relate your testing and subsequent results!
** WARNING: connection is not using a post-quantum kex exchange algorithm.
** This session may be vulnerable to "store now, decrypt later" attacks.
** The server may need to be upgraded. See https://openssh.com/pq.html
** Also see: post quantum internet 2025 - https://blog.cloudflare.com/pq-2025/

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #97 on: May 02, 2026, 12:53:04 AM »
@gadget42, thanks for the warm comment.
It feels good. Assuming this ends in finding an OS-bug and not a “stupid stupid application mistake after all”, it’s my contribution to the community which is more than fair given all that I’m taken from it.

This morning, after 23 hours, still running.
So I’m happy that I did not change the screensaver setup as than I would have been doubting that to be the reason.
So far “faulty configurations” crash on average after 15 hours, max I have seen after 35 hours. So I keep it running until tomorrow morning to see whether it’s still alive after 48 hours. I have obligations today so no opportunity to do anything earlier anyways.

If this keeps running it would indicate that select() is NOT the cause.
It would than strongly point tot the blocking setting of read() to be the cause.
If it still runs tomorrow morning I will restart without select() and with read() in blocking mode.
I get data bursts every second so the blocking read() will pass regularly. This can NOT be a solution for my application because it basically freezes with 1 second gaps. But that’s ok. It’s testing.

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #98 on: May 03, 2026, 01:40:01 AM »
Still running after 47hrs.
So far the longest it has been running in a faulty mode was 35hr so it looks like "non faulty" but I may need 7days to be fully sure.
- that means that the select() does not look like to cause trouble
- the only other change compared to crashing April 20 and april21 run was the use of non-blocking read()

So... I now started a run that uses blocking read() but NO select.
The serial data comes in as a continuous per second stream of 0.5sec datablocks. As the read() is no longer protected by select() this means the application is pausing for 0.5sec/second so it basically runs at half speed.
I had to adapt and recompile the program to make "blocking" a configuration item.

To summarize:

Just halted:
TC17: 47 hrs crash free run; start may-1 8:00 > may-3 7:00; MANUALLY STOPPED
- including kext: usb-serial-6.18.2-tinycore.tcz
- same executable as before
- configuration: non-blocking read() AND select()
- "every second 812byte read from /dev/ttyUSB0; baudrate 11520" from /dev/ttyUSB0
- 1x tested, 47hrs

Just started:
TC17: xxxx; start may-3 7:20 > xxxx
- including kext: usb-serial-6.18.2-tinycore.tcz
- recompiled executable with configuration option to call read() in blocking mode
- configuration: blocking read(), NO select()
- "every second 812byte read from /dev/ttyUSB0; baudrate 11520" from /dev/ttyUSB0
- 1x tested, xxxx

Note:
Few weeks ago I transferred the network connected pheripherals to my 2nd computer. It's not optimal but my home automation is "with less nice gui" operational being split over 2 computers during this test phase.


Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #99 on: May 04, 2026, 02:28:11 AM »
Oops, still running after 24hrs but I miscounted a day and manually stopped while I could better have let it run for an other day.

So:
- running with blocking read() without select() for 24hrs without crash
- accidentally manually stopped it
- restarted. I will let it run for 48hrs unless it crashes before that

For those interested, this is how the code looks likes

I use enable_meter as a global variable that is read from a configuration file to control what functions are active.

Selection bit masks:
Code: [Select]
#define meterREAD    1
#define meterFD      2
#define meterSELECT  4
#define meterREFRESH 8
#define meterBLOCKING 16

Initialize (call once):
Code: [Select]
int start_meter(tmeter *X)
{ int flags;
  meterMODE = enable_meter;

     /* some things we want to set arbitrarily */
   (X->pts).c_lflag &= ~ICANON;
   (X->pts).c_lflag &= ~(ECHO | ECHOCTL | ECHONL);
   (X->pts).c_cflag |= HUPCL;
   (X->pts).c_cc[VMIN] = 1;
   (X->pts).c_cc[VTIME] = 0;
   
   /* Standard CR/LF handling: this is a dumb terminal.
    * Do no translation:
    *  no NL -> CR/NL mapping on output, and
    *  no CR -> NL mapping on input.
    */
   (X->pts).c_oflag |= ONLCR;

   (X->pts).c_iflag &= ~ICRNL;

  /* set hardware flow control by default */
  (X->pts).c_cflag |= CRTSCTS;
  (X->pts).c_iflag &= ~(IXON | IXOFF | IXANY);
  /* set 115200 bps speed by default */
  cfsetospeed(&(X->pts), B115200);
  cfsetispeed(&(X->pts), B115200);

 
  X->fd = open(X->dev, O_RDWR);
  if (X->fd>=0)
  {  flags = fcntl(X->fd, F_GETFL);
      if ( !(meterMODE & meterBLOCKING) )
        fcntl(X->fd, F_SETFL, flags | O_NONBLOCK);
     tcsetattr(X->fd, TCSANOW, &(X->pts));
     X->SecLastRefresh = SECNOW();
  }
  return 1;
}

service (call every 250ms):
Code: [Select]
int do_readmeter(tmeter *X)
{ static int i=0;
  static int j=0;
  static int n=0;
  static int k=0;
  static char buf[BUFSIZE+1];
  fd_set fd_check;   
  struct timeval wait = {0};
 
meterMODE = enable_meter;


  if (meterMODE & meterFD)
  { FD_ZERO(&fd_check);
    FD_SET(X->fd, &fd_check);
  }
  if (meterMODE & meterSELECT)
    select(X->fd +1, &fd_check, NULL, NULL, &wait);
 
  if (meterMODE & meterFD)
    FD_ISSET(X->fd, &fd_check);
 
  //if (FD_ISSET(X->fd, &fd_check)  ) //this was originally there but removed for this investigation
  if (meterMODE & meterREAD)
  { j = read(X->fd, &buf[i], BUFSIZE-i);
   
    while (j>0)
    { buf[i] &=0x007F;
      if  (  buf[i]=='\n' )
      { do_parsemeter(buf, X);
        i++; j--;
        n=i;
        i=0;
        for (k=0; k<=j; k++)
          buf[i+k]=buf[i+n+k];
      }
      else
      { i++; j--;
      }
      if (i >= BUFSIZE)
      {  i=0; j=0;
      }
    }   
  }
  return 1;
}
« Last Edit: May 04, 2026, 02:40:44 AM by Stefann »

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #100 on: May 06, 2026, 12:47:33 AM »
And... BINGO.... crash somewhere between 00:00 6:30 this morning.
I don't have time today so I just restarted. I will setup to rerun with disabled screensaver later.

So...  what we have...

TC17
- including kext: usb-serial-6.18.2-tinycore.tcz
- continuously reading data from /dev/ttyUSB0 at baudrate 11520 getting "every second 812byte blocks"
- Without using select or any other port-related function:
---> manually stopped after 7 days and running without issue  while using read() in nonblocking mode
---> crash after 40..46hr while using read() in blocking mode

zero other changes between these 2 runs, running fine on TC15

As said, no time to explore today, I will definitely followup later

=====================================
Summarizing:

TC17: 7day, 167hr NO crash; start apr-24 8:22 > manually stop may-1 07:30
- including kext: usb-serial-6.18.2-tinycore.tcz
- new compiled executable with runtime configurable serial read modes
- run application with non-blocking read without SELECT
- "every second 812byte read from /dev/ttyUSB0; baudrate 11520"
- 1x tested, no crash, manually stopped after 7 days, 167hr.
SO:
- using the read() in non-blocking mode has all functionality I need, so to my application this is a very acceptable fix.
- Rootcasue for crash is likely either the select() OR the blocking nature of the read(), I keep zooming in to find out.

TC17: 47 hrs crash free run; start may-1 8:00 > may-3 7:00; MANUALLY STOPPED
- including kext: usb-serial-6.18.2-tinycore.tcz
- same executable as before
- configuration: non-blocking read() AND select()
- "every second 812byte read from /dev/ttyUSB0; baudrate 11520"
- 1x tested, 47hrs

TC17: 25hr crahsfree run; start may-3 7:20 > may-4 8:30 MANUALLY STOPPED
TC17: crash after 40...46hr; start may-4 8:30 > may-6    00:00...6:30
- including kext: usb-serial-6.18.2-tinycore.tcz
- recompiled executable with configuration option to call read() in blocking mode
- configuration: blocking read(), NO select()
- "every second 812byte read from /dev/ttyUSB0; baudrate 11520"
- 1x tested, 25hr crashfree
- 1x tested, crash after 40..46hr

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #101 on: May 07, 2026, 01:34:06 PM »
Short status update.

I did not have much time last 2 days so progress is limited.

- I succeeded in disabling the screensaver, so console will now be functional at a next crash . THANKS RICH I used the commands you gave me some days ago!
- I disabled (by configuration, no recompile needed) the read() in my application (it's a bit inconvenient but I found an alternative way to measure energy via my solar system)
- I started making a dedicated "crash program" that calls the read() from an endless loop with 2ms looptime. That is about 80x faster than the read() in my application which has a looptime of 250ms. The signal has data bursts/gaps of about 600ms/400ms. No acceleration will be present during the gaps.
- I was not able to give that a thorough test yet. I want to do that "attended" so I need to have few hours of time for that. With a little luck I find such testtime tomorrow.

to be continued....

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #102 on: May 08, 2026, 09:14:48 AM »
OK.. update...

I made a small "crash program" > see code below.
It basically reads from the serial port over usb every 100us.

It reads in "blocking mode".
The serial port is getting "every second 812byte datablock on /dev/ttyUSB0; baudrate 11520"
My original program runs a continuous loop that calls several service functions. The read function is called every 250ms. However... when in blocking mode at least half of them gets blocked so "average looptime" would have been around 500ms.

This "crash" program has a looptime of 100us BUT is of cause blocked during the  gaps in data availability.
logging shows I'm getting 151loops per minute. That's 397ms on average. Oops... that is not much faster.
I guess the 812 datablock gets read in about 3 calls after which the "end of the second" needs to be awaited until the next block.

In addition to the read() calls I have set an interval timer for 1ms (thank you brother for handing me this code). This timer interrupts the read(). Running on the assumption that the blocking call has difficulty handling interrupts this is intended to give that an extra stress.

Logging is written to a network connected 2nd computer so even if the computer crashes I will have the logging.

My main application is running in parallel. Without reading the serial port.

Console is working. screensaver no longer kicking in.

So...
It's running.
Now wait....




My crash test program:
Code: [Select]
#include <sys/time.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <time.h>
#include <termios.h>
#include <stddef.h>
#include <signal.h>

#include <fcntl.h>

#include <sys/times.h>
#include <sys/time.h>
#include <sys/stat.h>

#include <sys/select.h>

//#include <syslog.h>

 
#define usDELAYTIME 100 
#define minLOGINTERVAL 10

 
#define meterREAD    1
#define meterFD      2
#define meterSELECT  4
#define meterREFRESH 8
#define meterBLOCKING 16   

#define BUFSIZE 8191

int meterMODE = meterREAD | meterBLOCKING;
char logfile[128] = "/remote/home/tc/crashlog.txt";

struct termios pts;   // interface settings
int fd;

void usDelay(int len)
{ struct timespec delay;
  delay.tv_sec  = 0;
  delay.tv_nsec = len;
  delay.tv_nsec *= 1000; //1000=us,  set in us * 10e3
  nanosleep(&delay, NULL);
}

static void dummy_func(int sig)
{
}


int start_meter()
{ int flags;

     /* some things we want to set arbitrarily */
   pts.c_lflag &= ~ICANON;
   pts.c_lflag &= ~(ECHO | ECHOCTL | ECHONL);
   pts.c_cflag |= HUPCL;
   pts.c_cc[VMIN] = 1;
   pts.c_cc[VTIME] = 0;
   
   /* Standard CR/LF handling: this is a dumb terminal.
    * Do no translation:
    *  no NL -> CR/NL mapping on output, and
    *  no CR -> NL mapping on input.
    */
   pts.c_oflag |= ONLCR;
   pts.c_iflag &= ~ICRNL;

  /* set hardware flow control by default */
  pts.c_cflag |= CRTSCTS;
  pts.c_iflag &= ~(IXON | IXOFF | IXANY);
  /* set 115200 bps speed by default */
  cfsetospeed(&pts, B115200);
  cfsetispeed(&pts, B115200);

  fd = open("/dev/ttyUSB0", O_RDWR);
  if (fd>=0)
  {  flags = fcntl(fd, F_GETFL);
     if ( !(meterMODE & meterBLOCKING) )
        fcntl(fd, F_SETFL, flags | O_NONBLOCK);
     tcsetattr(fd, TCSANOW, &pts);
  }
  return 1;
}


int main()
{ char buf[BUFSIZE+1];
  fd_set fd_check;
  struct tm *tm_now;
  time_t now, tstamp;   
  struct timeval wait = {0};
  FILE *f_log;
  int j =0;
  unsigned long long count =0;
  unsigned long long kcount =0;
 
  struct itimerval interval = { 0 };
interval.it_interval.tv_sec = 0;
interval.it_interval.tv_usec = 1000; // repeat interval after 1st trigger; 1000 = 1ms
interval.it_value.tv_sec = 0;
interval.it_value.tv_usec = 1000;  // time until first trigger

 
 
 
  start_meter( );
  if (fd < 0)
  {  printf("cannot open device");
    exit(1);
  }

    setitimer(ITIMER_REAL, &interval, NULL); //set interval timer
signal(SIGALRM, dummy_func); // dummy interrupt call

  while (1)
  { if (meterMODE & meterFD)
    { FD_ZERO(&fd_check);
      FD_SET(fd, &fd_check);
    }
    if (meterMODE & meterSELECT)
      select(fd +1, &fd_check, NULL, NULL, &wait);
 
    if (meterMODE & meterFD)
      FD_ISSET(fd, &fd_check);
 
    if (meterMODE & meterREAD)
    { j = read(fd, buf, BUFSIZE); // read
    }
    usDelay(usDELAYTIME);
    count++;
    time(&now);
    if (now > tstamp+60*minLOGINTERVAL )
    { tm_now = localtime(&now);
      tstamp = now;
 
      kcount+= count/1000;
      count %= 1000; 
      f_log = fopen(logfile, "a");
      if (f_log!=NULL)
        fprintf(f_log, "mon day: %2d %2d | hh:mm:ss: %02d:%02d:%02d | Still alive after %lldk + %lld loops\n",
                tm_now->tm_mon+1, tm_now->tm_mday, tm_now->tm_hour, tm_now->tm_min, tm_now->tm_sec, kcount, count);
      fclose(f_log);
    }
  }
  exit(0);
}
« Last Edit: May 08, 2026, 09:18:50 AM by Stefann »

Offline Rich

  • Administrator
  • Hero Member
  • *****
  • Posts: 12756
Re: Tiny Core v17.0 upgrade issues
« Reply #103 on: May 08, 2026, 10:00:40 AM »
Hi Stefann
... The serial port is getting "every second 812byte datablock on /dev/ttyUSB0; baudrate 11520" ...
I'm guessing you meant 115200 there.
Assuming 8 data bits + 1 start bit + 1 stop bit + 2 bits of gap time
between each byte transmitted:

Bits per datablock = 12 * 812 = 9744 bit times.

Transmit time per datablock = 9744 / 115200 = 0.0846 seconds, or
just under 85ms.

Offline Stefann

  • Wiki Author
  • Full Member
  • *****
  • Posts: 180
Re: Tiny Core v17.0 upgrade issues
« Reply #104 on: May 08, 2026, 10:27:24 AM »
Hi Stefann
... The serial port is getting "every second 812byte datablock on /dev/ttyUSB0; baudrate 11520" ...
I'm guessing you meant 115200 there.
Assuming 8 data bits + 1 start bit + 1 stop bit + 2 bits of gap time
between each byte transmitted:

Bits per datablock = 12 * 812 = 9744 bit times.

Transmit time per datablock = 9744 / 115200 = 0.0846 seconds, or
just under 85ms.
Ah!
Yes!
good catch.
I did set it correctly in the code but my calculation of busy/pause time was indeed a factor of 10 off.
That explains a lot.

Unless it crashes I leave it running for 48hrs now.
Its all "a bit vague" so instead of further speculation "lets just see what comes out now"

Thanks for the insight. That helps with defining a followup configuration.

With that said.....
I think im getting close to exploring the source code.
My brother will likely help me. He is a bit of a linux guru.