Tiny Core Linux

Tiny Core Base => TCB Q&A Forum => Topic started by: theYinYeti on June 18, 2013, 07:45:51 AM

Title: [SOLVED] Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 18, 2013, 07:45:51 AM
Hi! I have a problem with the name of files on my FAT-formatted flash drive, which is annoying because I boot TC from there so I don’t get a chance to change mount options.

Whether in Linux or in Windows, I can give French file names without a problem, and all is fine. With TC however, I don’t really know where I stand.

On the one hand, urxvt+coreutils seems mostly happy with the current configuration; it’s unclear though (color added):
Quote
tc@tinycore:/mnt/sdb1/Tan$ ls
95 - Horaires du 01-10-12 au 12-07-13.pdf  Express Beaujoire Carquefou - Horaires du 01-10-12 au 12-07-13.pdf
C1 - Horaires du 01-10-12 au 12-07-13.pdf  Plan g?n?ral du r?seau 2012-2013.pdf
tc@tinycore:/mnt/sdb1/Tan$ ls P*
ls: cannot access P*: No such file or directory
tc@tinycore:/mnt/sdb1/Tan$ ls P<TAB>lan\ général\ du\ réseau\ 2012-2013.pdf
Plan g?n?ral du r?seau 2012-2013.pdf
tc@tinycore:/mnt/sdb1/Tan$ cp P<TAB>lan\ général\ du\ réseau\ 2012-2013.pdf test.pdf
tc@tinycore:/mnt/sdb1/Tan$ ls -l P<TAB>lan\ général\ du\ réseau\ 2012-2013.pdf test.pdf
-rwxrwxrwx 1 root root 959224 nov.  14  2012 Plan g?n?ral du r?seau 2012-2013.pdf
-rwxrwxrwx 1 root root 959224 juin  18 13:24 test.pdf
(<TAB> means I hit the TAB key for auto-completion)
More over, when an UTF-8 character appears in the command line, urxvt loses sync between the displayed and actual position of the caret (to the point where I can backspace into the prompt!), whereas aterm at least remains consistent.

On the other hand, some applications plainly seem to consider something is not as it should be:
— ROX tells me “This filename is not valid UTF-8. You should rename it.” But then, ROX itself seems to have problems managing the caret while renaming where such filenames appear.
— In Evince’s Open dialog, the filename is “Plan g[FFFD]n[FFFD]ral du r[FFFD]seau 2012-2013.pdf”.
— In MS Office, such a filename would appear as “Plan gnral du rseau 2012-2013”.
— And so on…

Some more info:
Code: [Select]
tc@rescue16g:~$ ls -l $(which ls)
lrwxrwxrwx 1 root root 38 juin  18  2013 /usr/local/bin/ls -> /tmp/tcloop/coreutils/usr/local/bin/ls
tc@rescue16g:~$ env | grep -iE 'lang|lc_'
LANG=fr_FR.utf8
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: curaga on June 18, 2013, 08:02:35 AM
What are the mount options?
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 18, 2013, 08:11:12 AM
Hi curaga. Here is the requested information:
Code: [Select]
tc@tinycore:~$ mount | grep sdb
/dev/sdb1 on /mnt/sdb1 type vfat (rw,relatime,fmask=0000,dmask=0000,allow_utime=0022,codepage=cp437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
These are (or should be) the default TinyCore options, since /dev/sdb1 is my boot device, and it automatically mounted on boot.
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 18, 2013, 08:14:46 AM
Here’s some more information:
Code: [Select]
tc@tinycore:~$ tr '\0' ' ' </proc/cmdline
BOOT_IMAGE=/.boot/corelnx/vmlinuz waitusb=5 tce=LABEL=FLASH/.boot/corelnx/tce loop.max_loop=256 showapps lang=fr_FR.utf8 kmap=azerty/fr-latin1 tz=Europe/Paris noutc restore=LABEL=FLASH/.boot/corelnx lst=onboot.lst desktop=fluxbox
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: Rich on June 18, 2013, 08:18:22 AM
Hi theYinYeti
Quote
Code: [Select]
tc@tinycore:/mnt/sdb1/Tan$ ls P*
ls: cannot access P*: No such file or directory
That might be caused by the  tab  character in the directory name. Embedding spaces in path and directory names
in my opinion can also sometimes cause problems.
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 18, 2013, 08:22:47 AM
No no :D
The <TAB> I wrote above is for when I hit the TAB key for auto-completion ;-)
(original post updated with this tip)
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: tinypoodle on June 18, 2013, 08:31:48 AM
The boot device is totally irrelevant to your issue.
It's the "tce=" option which makes you end up with default mount options for vfat.
You would have to omit "tce=" and add "base" to be able to use explicit mount options suiting your purposes best.
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 18, 2013, 08:42:42 AM
@tinypoodle: If I use “base” instead of “tce=…”, then ok my drive won’t get auto-mounted, but then I won’t have my favourite apps loaded, and the whole TinyCore experience in general, will I?
When I wrote “boot device”, I meant it in the TC meaning of the word, not the BIOS meaning. I do want to use “tce=…”, unless there’s something I’m missing.

I know I can use the “file” command to know what encoding is used inside a file. I know of no such thing for checking actual filenames encoding… I mean, “ls -1 | hexdump -C” is nice’n’all, but how can I trust the output? There are layers I don’t master between what is actually on the disk and what is output by ls: kernel, mount command, ls command, what else…
By the way, the above command actually works and indicates ISO-8859-1 encoding (1 byte per “é”), not UTF-8 (there would be 2 bytes per “é”).
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 18, 2013, 08:45:49 AM
I forgot to mention: I surmise the problem lies in the initrd (core.gz), and I’m perfectly willing to alter it to get rid of the problem (been there, done that). But I have to understand the source of the problem first, and where it comes from…
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: Juanito on June 18, 2013, 09:10:55 AM
Does loading any of the eglibc* extensions improve things?
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 18, 2013, 09:19:52 AM
Currently, “eglibc_gconv.tcz” gets auto-loaded by “mylocale.tcz”, which was generated by “getlocale.tcz”.
By the way, my current setup with “mylocale.tcz” is:
Code: [Select]
tc@tinycore$ locale -a
C
fr_FR
fr_FR@euro
fr_FR.iso88591
fr_FR.iso885915@euro
fr_FR.utf8
POSIX
and I choose “fr_FR.utf8” at boot.

I tried loading “eglibc_apps.tcz” as well, but there’s no change in subsequently opened terminal windows, even though I ran “sudo ldconfig” just before.
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: curaga on June 18, 2013, 03:30:21 PM
There's no way to autodetect the charset of a FAT stick. You could

- always mount FAT as UTF-8, by editing the fat32 default options in rebuildfstab
- boot with base and waitusb, having the mount command in bootsync.sh. You can call tce-setup manually to load your extensions in setups like this.
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem...
Post by: tinypoodle on June 19, 2013, 12:51:23 AM
Be aware that having empty spaces in file names (.incl dirs) is a totally distinct issue.
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 19, 2013, 02:40:09 AM
Hi! I think I’ll go with curaga’s first option. My core.gz is already a custom one anyway. It is still interesting to know that the second solution exists, though, may anyone encounter the same issue.
Let me try…
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem...
Post by: tinypoodle on June 19, 2013, 03:18:15 AM
You could try to pack your modified 'rebuildfstab' (.incl full path) into a separate gzipped cpio archive and load that after core.gz, so it would overwrite the original, thus making upgrading more simple.
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 19, 2013, 03:21:23 AM
@tinypoodle: That’s an excellent idea! I’ll first check the solution. If it works (UTF-8), I’ll probably redo it this way.
Title: Re: Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 19, 2013, 07:23:02 AM
OK. Problem solved. I changed /usr/sbin/rebuildfstab inside core.gz, this way:
Code: [Select]
sudo sed -i '
/%-15s %-15s %-8s/ i\
grep -i "lang=[^ ]*utf8" /proc/cmdline >/dev/null && OPTIONS="${OPTIONS},iocharset=utf8"
' usr/sbin/rebuildfstab
Which means that I add the option “iocharset=utf8” when the “lang” kernel parameter has been set to UTF-8.

All is not perfect yet (problem with urxvt) but that’s another issue. The current topic is solved as far as I’m concerned :-)
Title: Re: [SOLVED] Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: gerald_clark on June 19, 2013, 09:36:25 AM
If you use 'grep -q' you do not need to redirect its output to /dev/null.
Title: Re: [SOLVED] Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 19, 2013, 09:50:14 AM
Hi gerald_clark :-)
You’re right. That’s what I usually do, but I know that -q is a rather new option to grep, and I did not know how crippled grep would be at this early stage of boot (that’s busybox’ grep). For example: busybox’ find command is a pain to use…
Title: Re: [SOLVED] Wrong locale / charset for FAT flash drive, maybe Unicode problem...
Post by: tinypoodle on June 20, 2013, 11:36:16 PM
but I know that -q is a rather new option to grep

How new? I couldn't remember it not existing...

Quote
I did not know how crippled grep would be at this early stage of boot (that's busybox' grep). For example: busybox' find command is a pain to use...

You make an arbitrary personal opinion about 'find' sound like a commonly accepted fact, without any reasoning at all. I have neither happened to see any bug reports nor any complaints about it, while I use 'find' since many years as well for interactive use (terminal) as for non-interactive use (shell scripting) to full satisfaction.
Your suspicion that 'grep' might be crippled is lacking any foundation.
Title: Re: [SOLVED] Wrong locale / charset for FAT flash drive, maybe Unicode problem…
Post by: theYinYeti on June 21, 2013, 03:49:51 AM
OK, maybe I did not use the right word. Maybe “crippled” should be changed to “limited” or the like. That’s what you get when dealing with non-English people like me :-)
My problem with find (I did not investigate grep, as the workaround I used was rather evident) is that it is lacking the “-printf” option, which allows to choose what properties of the entries found to display. I use this option a lot, and I miss it. But then, TinyCore has the “findutils” extension for people like me.
Have a good day!

[edit: I should add that I tend to be cautious since I regularly work with Unix shell like old SunOS, which are a real pain to work with, except when GNU tools are installed.]
Title: Re: [SOLVED] Wrong locale / charset for FAT flash drive, maybe Unicode problem...
Post by: tinypoodle on July 10, 2013, 12:54:22 PM
Hi gerald_clark :-)
You're right. That's what I usually do, but I know that -q is a rather new option to grep, and I did not know how crippled grep would be at this early stage of boot (that's busybox' grep).

I just happened to stumble over some older busybox laying around here:

Code: [Select]
BusyBox v0.60.5 (2003.02.15-11:25+0000) multi-call binary

Usage: grep [-ihHnqvs] PATTERN [FILEs...]

[.....]

-q      be quiet. Returns 0 if result was found, 1 otherwise