WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: UTF-8, why is there a question mark (shown?) in the filename?  (Read 3504 times)

Offline emmi

  • Newbie
  • *
  • Posts: 29
UTF-8, why is there a question mark (shown?) in the filename?
« on: February 10, 2016, 03:13:37 PM »
Code: [Select]
tc@box:~$ version
6.4.1
tc@box:~$ uname -a
Linux box 3.16.6-tinycore #777 SMP Thu Oct 16 09:42:42 UTC 2014 i686 GNU/Linux

I downloaded getlocale.tcz and configured it; I booted with lang=en_US.UTF-8 and loaded mylocale.tcz.

In a urxvt terminal window I get:

Code: [Select]
tc@box:~$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
tc@box:~$ echo -e "\xce\xb4"
δ
tc@box:~$ echo -e "\xc3\x84"
Ä
tc@box:~$ touch greek-delta-`echo -e "\xce\xb4"`
tc@box:~$ touch A-umlaut-`echo -e "\xc3\x84"`
tc@box:~$ ls
A-umlaut-Ä     greek-delta-?
tc@box:~$

The A-umlaut is U+00C4, which is in the Latin-1 Supplement , aka ISO 8859-1 block, the greek delta is U+03B4, which is in the  Greek and Coptic block; both are in plane 0.

This works as expected on other Linux distributions.

Any hint, what I'm doing wrong or what I'm missing here is welcome.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 10957
Re: UTF-8, why is there a question mark (shown?) in the filename?
« Reply #1 on: February 11, 2016, 01:37:02 AM »
Are you using busybox ls, and/or busybox ash? They do not support utf-8 fully. Please install bash and gnu tools (coreutils and util-linux), and close and open your urxvt window so the paths refresh. Maybe also specify bash instead of ash in your urxvt config file, so that bash is used always.
The only barriers that can stop you are the ones you create yourself.

Offline emmi

  • Newbie
  • *
  • Posts: 29
Re: UTF-8, why is there a question mark (shown?) in the filename?
« Reply #2 on: February 11, 2016, 07:41:16 AM »
Thanks for the hints, they helped.

I used/use the shell out of the box: busybox. Bash may be useful for other things, but to get the funny names correctly displayed coreutils, which contains ls, seems sufficient. Your comment about ls triggered me to try echo instead, and that worked immediately.
Code: [Select]
tc@box:~$ echo $SHELL
/bin/sh
tc@box:~$ ls -l /bin/sh
lrwxrwxrwx    1 root     root             7 Nov 29 13:53 /bin/sh -> busybox
tc@box:~$ which ls
/bin/ls
tc@box:~$
tc@box:~$ ls -l /bin/ls
lrwxrwxrwx    1 root     root             7 Nov 29 13:53 /bin/ls -> busybox
tc@box:~$
tc@box:~$ touch greek-delta-`echo -e "\xce\xb4"`
tc@box:~$ ls
greek-delta-?
tc@box:~$ echo *
greek-delta-δ
tc@box:~$
After only installing coreutils:
Code: [Select]
tc@box:~$ which ls
/usr/local/bin/ls
tc@box:~$ ls
greek-delta-δ
tc@box:~$

Offline Misalf

  • Hero Member
  • *****
  • Posts: 1702
Re: UTF-8, why is there a question mark (shown?) in the filename?
« Reply #3 on: February 11, 2016, 10:43:24 AM »
curaga, since you're suggesting to define Bash as shell in URxvt's config file...
Would it be OK to
Code: [Select]
export SHELL=/usr/local/bin/bash
from  ~/.profile , so it's used system wide (provided it's always available),
or would that possibly break base Core features?
Download a copy and keep it handy: Core book ;)

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 10957
Re: UTF-8, why is there a question mark (shown?) in the filename?
« Reply #4 on: February 11, 2016, 11:32:47 AM »
Nothing should break, even if you made bash /bin/sh. There aren't any ash-specific features, bash is (should be) a full superset.
The only barriers that can stop you are the ones you create yourself.

Offline Misalf

  • Hero Member
  • *****
  • Posts: 1702
Re: UTF-8, why is there a question mark (shown?) in the filename?
« Reply #5 on: February 11, 2016, 12:25:36 PM »
Thanks curaga.

I had trouble getting characters of my locale (umlauts) to be displayed correctly too.
Basically fixed by using URxvt except for the Euro (€) sign.
I noticed earlier that using Bash also makes it possible to type the Euro sign without printing a question mark on the prompt - I just wasn't sure if it's a good idea to make it the default.

However, GNU ls on tty doesn't properly display umlauts and Euro sign in file names.
output of ls in URxvt:
Code: [Select]
.A1äöü€@
output of ls in Aterm and tty:
Code: [Select]
.A1?????????@

I'm using LANG=C
Download a copy and keep it handy: Core book ;)

Offline Misalf

  • Hero Member
  • *****
  • Posts: 1702
Re: UTF-8, why is there a question mark (shown?) in the filename?
« Reply #6 on: February 11, 2016, 12:34:03 PM »
Nevermind, fixed by  LC_CTYPE  (which I had in .ashrc especially for rxvt-unicode since aterm goes crazy if it's not C).
Download a copy and keep it handy: Core book ;)