WelcomeWelcome | FAQFAQ | DownloadsDownloads | WikiWiki

Author Topic: UTF-8  (Read 12533 times)

Offline bmarkus

  • Administrator
  • Hero Member
  • *****
  • Posts: 7183
    • My Community Forum
UTF-8
« on: August 21, 2010, 04:13:00 PM »
As TC users are all around the world localozation is important. I see the fear that it will increase size when small footprint is one of the key differentiators of TC/MC. Fortunately the other key differentiator is the modularity.

Localization requires many thing. Tools to easy setup and change language, font, keyboard, availability of translated applications, moving to UTF-8, etc. For sure it is a long process. The good thing is that most of it can be done parallel to base development on extension level.

Moving ncurses out of the base is one of the crucial initial steps. I will submit the UTF-8 (wide) version of ncurses. I think to establish a small ad hoc team (task force) would be useful to start working on.

Feedbacks are welcome :)
Béla
Ham Radio callsign: HA5DI

"Amateur Radio: The First Technology-Based Social Network."

Offline tinypoodle

  • Hero Member
  • *****
  • Posts: 3857
Re: UTF-8
« Reply #1 on: August 22, 2010, 02:07:40 AM »
Perhaps a meta-extension (similar to compiletc) for localization could be useful?
"Software gets slower faster than hardware gets faster." Niklaus Wirth - A Plea for Lean Software (1995)

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11044
Re: UTF-8
« Reply #2 on: August 22, 2010, 03:21:41 AM »
I suppose a script to create a locale-archive would be useful. That way it has to be done only once, and doesn't contain bloat unnecessary to the user.

There is no stable FLTK with utf-8, but everything else should be possible.
The only barriers that can stop you are the ones you create yourself.

Offline hiro

  • Hero Member
  • *****
  • Posts: 1229
Re: UTF-8
« Reply #3 on: August 22, 2010, 05:40:50 AM »
It's always good if the core gets smaller. If that also makes UTF-8 support easier, great!
I don't like ncurses anyway.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11044
Re: UTF-8
« Reply #4 on: August 22, 2010, 07:56:31 AM »
Said script posted:

http://distro.ibiblio.org/pub/linux/distributions/tinycorelinux/3.x/tcz/getlocale.tcz.info

Quote
Title:          getlocale.tcz
Description:    Script to build customized locale support
Version:        1.0
Author:         Curaga
Original-site:  http://tinycorelinux.com
Copying-policy: GPLv3
Size:           4k
Extension_by:   Curaga
Comments:       To avoid having one huge locale support extension, this
      script builds a customized one according to your
      selections.
-
      If you load this in the console, the script is called
      getlocale.sh.
-
      The new extension will be called mylocale.tcz.
Change-log:     
Current:        2010/08/22 Original
The only barriers that can stop you are the ones you create yourself.

Offline SvOlli

  • Full Member
  • ***
  • Posts: 193
  • Linux Developer
Re: UTF-8
« Reply #5 on: August 22, 2010, 10:28:24 AM »
Hello Bélà,

moving ncurses out of  base sounds like a good idea. The only program in the base that depends on ncurses is /usr/bin/tset, which itself is not executed from any other program of the base. Once the decision has been made, on what to with it (remove or replace), there's only one thing left to do before ncurses can be moved out: write a script that peeks into the extensions if they link against ncurses, and add a line to a .dep file, if they do so. Both requirements can be taken care of on a rather short timeframe, probably even during the current rc phase of 3.1.

Greetings,
SvOlli

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11044
Re: UTF-8
« Reply #6 on: August 22, 2010, 10:33:55 AM »
The transition is done already ;)
« Last Edit: August 22, 2010, 10:35:28 AM by curaga »
The only barriers that can stop you are the ones you create yourself.

Offline roberts

  • Retired Admins
  • Hero Member
  • *****
  • Posts: 7361
  • Founder Emeritus
Re: UTF-8
« Reply #7 on: August 22, 2010, 11:02:36 AM »
Quote
moving ncurses out of  base sounds like a good idea. The only program in the base that depends on ncurses is /usr/bin/tset, which itself is not executed from any other program of the base
Already removed for 3.1rc2, and libncurses.so* and tset  is already in the recently posted ncurses.tcz.
10+ Years Contributing to Linux Open Source Projects.

Offline TaoTePuh

  • Full Member
  • ***
  • Posts: 172
Re: UTF-8
« Reply #8 on: August 22, 2010, 11:53:50 AM »
@curaga

Thank you for the script!

Maybe you want to change line 66 to avoid double entries in onboot.lst :

Code: [Select]
[ -z `grep mylocale.tcz  ${TCEDIR}/onboot.lst` ]  && echo "mylocale.tcz" >> ${TCEDIR}/onboot.lst

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11044
Re: UTF-8
« Reply #9 on: August 22, 2010, 12:06:41 PM »
@curaga

Thank you for the script!

Maybe you want to change line 66 to avoid double entries in onboot.lst :

Code: [Select]
[ -z `grep mylocale.tcz  ${TCEDIR}/onboot.lst` ]  && echo "mylocale.tcz" >> ${TCEDIR}/onboot.lst

Updated to check for that.
The only barriers that can stop you are the ones you create yourself.

Offline Syun

  • Newbie
  • *
  • Posts: 17
Re: UTF-8
« Reply #10 on: August 22, 2010, 01:53:30 PM »
Hello. I'm Japanese. Please excuse my weakly English.

I make Japanese versions of Tiny Core Linux.
And I redistribute it for some Japanese user.
[^thehatsrule^: removed, remaster?]

I need to remaster tinycore.gz for several reasons.
Most of USB stick is formatted by VFAT.
It must be mounted with "codepage=932,iocharset=utf8" options to display Japanese character.

a) need Japanese NLS module for kernel.
b) need to change mount options in rebuildfstab.

I can't implement this with localized extensions.
And I need to change usbinstall script to change kernel boot options.

Code: [Select]
APPEND initrd=/boot/$ROOTFS.gz quiet waitusb=5:"$TARGETUUID" tce="$TARGETUUID" lang=ja_JP.utf8 kmap=jp106 tz=Asia/Tokyo noutc showapps
In tc-config, execute hwclock command before loading extensions.
Therefore it can't adjust clock after loading extensions.

I think remastering is smart method rather than localized extensions.
But I don't deny localized extensions.

Thank you.
« Last Edit: August 22, 2010, 10:20:35 PM by ^thehatsrule^ »

Offline tinypoodle

  • Hero Member
  • *****
  • Posts: 3857
Re: UTF-8
« Reply #11 on: August 22, 2010, 04:10:28 PM »
In tc-config, execute hwclock command before loading extensions.
Therefore it can't adjust clock after loading extensions.
I don't think that is an aspect particular to localization, same is the case if hwclock is desired to be synced through net which may require drivers/firmware.
There are several threads in the forum referring to this subject.
"Software gets slower faster than hardware gets faster." Niklaus Wirth - A Plea for Lean Software (1995)

Offline eluring

  • Newbie
  • *
  • Posts: 22
Re: getlocale.tcz
« Reply #12 on: August 23, 2010, 03:06:35 PM »
@curaga
I am pleasantly surprised about the speed in which proposals are adopted and implemented.
Thank you for the script!

@all
Just an idea for improvement
Code: [Select]
71 echo "Reboot with lang=xx_YY to start using this."could avoid confusion
and create confusion: if UTF-8 is chosen that has to be lang=xx_YY.UTF-8

My proposal:
As you have been happily living without locales and the demand is UTF-8, cut the old braids (ISOs)
Code: [Select]
grep 'UTF-8/' < SUPPORTED | cut -d '/' -f 1 > SUPPORTED2.utf8number of locales decreases from 415 to 148,
we don`t need to copy bloated stuff, do we? Let TC stay tiny in extensions, too.
If every locale is UTF-8 only, we can can totally omit the term 'UTF-8' in getlocales.sh dialog and in bootcode lang.
Just use it!

Less is more (usability)

Discussion?

ps: The dialog looks very nice in Aterm,  even nicer if long terms like 'de_AT@euro/ISO-8859-15' would fit into the frame. After proposed cut everything will be fitting.

« Last Edit: August 23, 2010, 03:34:57 PM by eluring »
Everyone is a foreigner, almost everywhere.

Offline curaga

  • Administrator
  • Hero Member
  • *****
  • Posts: 11044
Re: getlocale.tcz
« Reply #13 on: August 24, 2010, 09:23:16 AM »
@curaga
I am pleasantly surprised about the speed in which proposals are adopted and implemented.
Thank you for the script!

I was bored ;) You're welcome.

Quote
@all
Just an idea for improvement
Code: [Select]
71 echo "Reboot with lang=xx_YY to start using this."could avoid confusion
and create confusion: if UTF-8 is chosen that has to be lang=xx_YY.UTF-8

Since there are various encodings for each locale, plus possible @euro, I doubt saying the utf-8 bit would help much.

Quote
My proposal:
As you have been happily living without locales and the demand is UTF-8, cut the old braids (ISOs)
Code: [Select]
grep 'UTF-8/' < SUPPORTED | cut -d '/' -f 1 > SUPPORTED2.utf8number of locales decreases from 415 to 148,
we don`t need to copy bloated stuff, do we? Let TC stay tiny in extensions, too.
If every locale is UTF-8 only, we can can totally omit the term 'UTF-8' in getlocales.sh dialog and in bootcode lang.
Just use it!

I don't think cutting on the selection would be that useful. The extension would not drop in size at all, since squashfs rounds up to the blocksize (4k). Users may want any supported locale.

Also, any utf-8 locale creates a much bigger locale-archive than the corresponding ISO-encoded one.

Quote
Less is more (usability)

Discussion?

ps: The dialog looks very nice in Aterm,  even nicer if long terms like 'de_AT@euro/ISO-8859-15' would fit into the frame. After proposed cut everything will be fitting.

The three numbers (0 0 10) correspond to height, width, and items to show. If you wish to tune the middle number, post what number looks good.
It's in characters, so 22 would be a minimum.
The only barriers that can stop you are the ones you create yourself.

Offline eluring

  • Newbie
  • *
  • Posts: 22
Re: UTF-8
« Reply #14 on: August 24, 2010, 12:33:38 PM »
Quote
The three numbers (0 0 10) correspond to height, width, and items to show. If you wish to tune the middle number, post what number looks good.

Code: [Select]
42 will satisfy any potential user from Estonia
'et_EE.ISO-8859-15/ISO-8859-15'  :o
Everyone is a foreigner, almost everywhere.