unicode(n) Unicode normalization unicode(n)


unicode - Implementation of Unicode normalization

package require Tcl 8.3

package require unicode 1.0

::unicode::fromstring string

::unicode::tostring uclist

::unicode::normalize form uclist

::unicode::normalizeS form string


This is an implementation in Tcl of the Unicode normalization forms.

::unicode::fromstring string
Converts string to list of integer Unicode character codes which is used in unicode for internal string representation.
::unicode::tostring uclist
Converts list of integers uclist back to Tcl string.
::unicode::normalize form uclist
Normalizes Unicode characters list ulist according to form and returns the normalized list. Form form takes one of the following values: D (canonical decomposition), C (canonical decomposition, followed by canonical composition), KD (compatibility decomposition), or KC (compatibility decomposition, followed by canonical composition).
::unicode::normalizeS form string
A shortcut to ::unicode::tostring [unicode::normalize \$form [::unicode::fromstring \$string]]. Normalizes Tcl string and returns normalized string.

% ::unicode::fromstring "\u0410\u0411\u0412\u0413"
1040 1041 1042 1043
% ::unicode::tostring {49 50 51 52 53}
12345
%
% ::unicode::normalize D {7692 775}
68 803 775
% ::unicode::normalizeS KD "\u1d2c"
A
%

[1]
"Unicode Standard Annex #15: Unicode Normalization Forms", (http://unicode.org/reports/tr15/)

Sergei Golovan

This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category stringprep of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation.

stringprep(n)

normalization, unicode

Copyright (c) 2007, Sergei Golovan <sgolovan@nes.ru>
1.0.0 stringprep