Tiny Core Linux

Tiny Core Extensions => TCE Bugs => Topic started by: GNUser on December 09, 2019, 09:55:52 AM

Title: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 09, 2019, 09:55:52 AM
I'm on Pure64 10.1. In all GTK2 and GTK3 applications, I get an error ("Invalid file name") in file selection box whenever I include a Unicode character in the filename. Here's an example when trying to save a file while using a GTK3 application:

(http://files.dantas.airpost.net/public/save_file.jpg)

I'm using a Unicode locale:
Code: [Select]
bruno@box:~$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=
BTW I do not have this problem with applications that use a different graphical toolkit: Xfe file manager uses fox toolkit and can create new files and directories with Unicode characters in their names without any problem.

What do I need to do in order for GTK2 and GTK3 applications in Pure64 to support Unicode characters in file names?

P.S. The same GTK2 and GTK3 applications running in different OS (e.g., Devuan) can handle Unicode characters in filenames. So my guess is that this is either a base system issue in Pure64 or else an issue with how GTK2 and GTK3 were compiled for Pure64.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: curaga on December 09, 2019, 11:39:39 PM
That "invalid file name" dialog in gtk2 comes from gtkfilechooserdefault.c. That then calls it because g_file_get_child_for_display_name from glib2 failed. Maybe this helps in pinpointing.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: Juanito on December 10, 2019, 12:00:45 AM
Since the error occurs when accessing the file system, does your mount command need a utf8 switch?
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 10, 2019, 05:26:36 AM
juanito - No, mount command in urxvt just works (even if there are Unicode characters in the mountpoint). No need for utf8 switch:

Code: [Select]
bruno@box:~$ which mount
/bin/mount
bruno@box:~$ ls -l /bin/mount
lrwxrwxrwx    1 root     root            12 Jun  9  2019 /bin/mount -> busybox.suid

bruno@box:~$ sudo mkdir /mnt/eĥoŝanĝoĉiuĵaŭde
bruno@box:~$ sudo mount /dev/sdc1 /mnt/eĥoŝanĝoĉiuĵaŭde
bruno@box:~$ ls /mnt/eĥoŝanĝoĉiuĵaŭde
somefile.txt   test1.txt      test2.txt      some_dir/
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 11, 2019, 08:26:30 AM
At lowest level, the C library in Pure64 can handle UTF-8 in filenames beautifully:

Code: [Select]
bruno@box:~$ cat test.c
#include <stdio.h>

int main(void)
{
FILE *fp;

fp = fopen("/home/bruno/eĥoŝanĝoĉiuĵaŭde.txt", "w+");
fprintf(fp, "hello world");
fclose(fp);
return 1;
}
bruno@box:~$ tce-load -wi compiletc
bruno@box:~$ gcc test.c
bruno@box:~$ ./a.out
bruno@box:~$ cat eĥoŝanĝoĉiuĵaŭde.txt
hello world

Also, it seems that GTK is UTF-8 aware out of the box, so the GTK2 and GTK3 libraries themselves are probably innocent. There's some helpful information here: https://wiki.gentoo.org/wiki/UTF-8

Since both GTK2 and GTK3 are affected, my guess is that the problem lies with one of their shared dependencies responsible for parsing filenames.

Alas, I know very little about the GUI stack. I will not be able to investigate this further without guidance from someone more knowledgeable.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 11, 2019, 08:33:15 AM
P.S. The gentoo wiki (see link above) has a section for "filenames" in the UTF-8 page. It mentions convmv (not in Pure64 repository) and iconv (part of glibc_apps.tcz). Loading glibc_apps.tcz does not solve the problem.

One last tidbit: My GTK2 and GTK3 applications running in Pure64 can display UTF-8 characters just fine. I can also type UTF-8 characters into those applications. The problem seems limited to filenames.

Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 11, 2019, 03:49:28 PM
A helpful GNOME/GTK user (developer?) suggested this issue may be due to unicode data tables missing from glib2. He recommended that I try installing glib2-locale.tcz: https://discourse.gnome.org/t/support-for-unicode-characters-in-gtk2-3-file-selection-box/2338/5

Alas, glib2-locale.tcz is not available in the Pure64 repository :( I put in an extension request.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: Rich on December 11, 2019, 05:49:21 PM
Hi GNUser
I just ran a  diff  between the 32 and 64 bit  base-locale.tcz  extensions and they appear to be identical. See if this works for you:
http://tinycorelinux.net/10.x/x86/tcz/glib2-locale.tcz
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 11, 2019, 07:40:24 PM
Hi, Rich. Thank you for that. I loaded glib2-locale.tcz from x86 using the link you provided. It has no deleterious effects, but makes no difference with my issue :(

I looked at the contents of glib2-locale.tcz and noticed that my locale (en_US) is not represented. Could that have something to do with why it doesn't help with my issue?

I always use the lang=en_US.UTF-8 boot code and mylocale.tcz (which contains only en_US.UTF-8) is in my onboot.lst. You'd think that would be sufficient to provide the unicode data tables that glib2 needs? (Sorry if I sound clueless. Truth is I am clueless with regard to this issue.)
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: Rich on December 11, 2019, 08:04:45 PM
Hi GNUser
There's a previous version that contains:
Code: [Select]
usr/local/share/locale/en_GB/LC_MESSAGES/glib20.mo
usr/local/share/locale/en_CA/LC_MESSAGES/glib20.mo
found here:
http://tinycorelinux.net/9.x/x86/tcz/glib2-locale.tcz
I don't know how version dependent these files are. TC9 glib2 is version 2.52.3 but the locale file is 2.45.2.

I also noticed this:
http://tinycorelinux.net/10.x/x86/tcz/gtk2-locale.tcz
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 11, 2019, 08:31:55 PM
Thanks, Rich. Loading gtk2-locale.tcz made no difference to my gtk2 application (thunderbird).

To hopefully take advantage of en_CA in glib2-locale.tcz I deleted mylocale.tcz from tcedir/optional/ then loaded getlocale.tcz then generated a new mylocale.tcz containing en_CA.UTF-8. Rebooted with the new locale then loaded glib2-locale.tcz. No difference.

Quite the stubborn problem!

P.S. Truth of the matter is that I only rarely need to create filenames with fancy characters. Where I get hit with this bug the most (sometimes daily) is when I try to print a webpage to PDF from my browser (iridium-browser, which uses GTK3) and there is a dash (not an ASCII hyphen/minus) somewhere in the webpage's name. As an example, check out duckduckgo.com's homepage. The dash in the page name triggers this bug--if I try to print the page to a PDF file, three "invalid file name" dialogs appear. The dialogs must be closed (in the correct order UGH) before I can delete the dash and proceed with printing.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: Juanito on December 11, 2019, 08:53:18 PM
As I understand it, en_US is the default in linux, so no additional locale files are required.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: curaga on December 12, 2019, 12:43:08 AM
Indeed, the .mo files are translated messages, not tables/etc. They would let you get glib2 errors in German for example.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 12, 2019, 05:14:21 AM
Thank you, juanito and curaga. So we can confidently eliminate glib2-locale as the missing piece here.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: Rich on December 12, 2019, 06:00:55 AM
Hi GNUser
Thank you, juanito and curaga. So we can confidently eliminate glib2-locale as the missing piece here.
As well as any other  -locale  extension since they only translate pre canned messages (i.e "Invalid file name") compiled into
the library.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 12, 2019, 06:11:07 AM
Rich - Good to know. Thanks.

Two quick questions:

1. Can anyone confirm whether glib2.tcz has been stripped of "unicode tables" in the interest of keeping the extension small?

2. In case these tables are absent from glib2.tcz, how do I obtain them? Can I just copy the relevant files from a different distro?
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: Rich on December 12, 2019, 06:22:32 AM
Hi Hi GNUser
A helpful GNOME/GTK user (developer?) suggested this issue may be due to unicode data tables missing from glib2. He recommended that I try installing glib2-locale.tcz: https://discourse.gnome.org/t/support-for-unicode-characters-in-gtk2-3-file-selection-box/2338/5

Alas, glib2-locale.tcz is not available in the Pure64 repository :( I put in an extension request.
Maybe he meant  glibc_gconv.tcz  or  glibc_i18n_locale.tcz (which also contains character maps)?
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 12, 2019, 06:49:13 AM
glibc_gconv.tcz has been loaded all along. Loading glibc_i18n_locale.tcz makes no difference :(
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 12, 2019, 07:29:50 AM
In case this helps, here is a simple way I can trigger the bug: I launch my web browser (iridium, which uses GTK3), go to duckduckgo.com, press Control+p to print, choose "Save as PDF", then click "Save".

The page name (DuckDuckGo – Privacy, simplified.) is automatically selected as the filename for the PDF and the dash (between DuckDuckGo and Privacy) triggers the bug. I cannot print to PDF until I delete the dash in the filename (replacing it with a hyphen, for example).
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 12, 2019, 08:47:36 AM
Solved! This does the trick:

Code: [Select]
$ export G_FILENAME_ENCODING=UTF-8
Make it permanent by adding it to ~/.profile

I found the solution when I stumbled upon this:
https://www.gnu.org/software/guile-gnome/docs/gtk/html/GtkFileChooser.html#GtkFileChooser
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: curaga on December 12, 2019, 10:23:14 AM
That reminded me, my LFS box has this in bashrc:
Quote
export G_FILENAME_ENCODING=@locale

However glib docs claim it defaults to utf-8, which is weird in your case.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 12, 2019, 10:37:16 AM
If I don't specify a value for the variable, I find that it's set to iso8859-1:

Code: [Select]
bruno@box:~$ echo $G_FILENAME_ENCODING
iso8859-1

Maybe glib does default to utf-8--I found this in /etc/profile:

Code: [Select]
# Screen display for X and encoding for GTK+ apps.
#
G_FILENAME_ENCODING=iso8859-1

Boy, did it cause me a world of hurt. Maybe something similar to what LFS does would be a better way to go than hardcoding a default encoding?
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: curaga on December 12, 2019, 10:45:15 AM
I think that's for best compatibility with fat32 etc. Trying to create utf-8 names on fat32 would either fail or be incompatible with Windows IIRC. By default, TC does not have locales installed, it's the "C" default which is ASCII, which would prevent the @locale from being useful.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 12, 2019, 10:46:01 AM
If I don't specify an encoding in ~/.profile and comment-out the relevant line in /etc/profile (and backup the changes), the variable doesn't have a value after a reboot:

Code: [Select]
bruno@box:~$ echo $G_FILENAME_ENCODING

bruno@box:~$

Everything works as expected in this scenario, so glib must indeed be defaulting to utf-8.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: GNUser on December 12, 2019, 10:49:04 AM
Perhaps getlocale.tcz could give the user a hint about this environmental variable potentially being a pitfall? It could prevent some pain.
Title: Re: GTK2 and GTK3 cannot handle Unicode characters in file names
Post by: curaga on December 13, 2019, 01:38:34 AM
Note added to getlocale.tcz.info (10.x x86 and x86_64).