stringprep(n) | Preparation of Internationalized Strings | stringprep(n) |
stringprep - Implementation of stringprep
package require Tcl 8.3
package require stringprep 1.0.1
::stringprep::register profile ?-mapping list? ?-normalization form? ?-prohibited list? ?-prohibitedList list? ?-prohibitedCommand command? ?-prohibitedBidi boolean?
::stringprep::stringprep profile string
::stringprep::compare profile string1 string2
This is an implementation in Tcl of the Preparation of Internationalized Strings ("stringprep"). It allows to define stringprep profiles and use them to prepare Unicode strings for comparison as defined in RFC-3454.
Option -mapping specifies stringprep mapping tables. This parameter takes list of tables from appendix B of RFC-3454. The usual list values are {B.1 B.2} or {B.1 B.3} where B.1 contains characters which commonly map to nothing, B.3 specifies case folding, and B.2 is used in profiles with unicode normalization form KC. Defult value is {} which means no mapping.
Option -normalization takes a string and if it is nonempty then it uses as a name of Unicode normalization form. Any value of "D", "C", "KD" or "KC" may be used, though RFC-3454 defines only two options: no normalization or normalization using form KC.
Option -prohibited takes a list of RFC-3454 tables with prohibited characters. Current version does allow to prohibit either all tables from C.3 to C.9 or neither of them. An example of this list for RFC-3491 is {A.1 C.1.2 C.2.2 C.3 C.4 C.5 C.6 C.7 C.8 C.9}.
Option -prohibitedList specifies a list of additional prohibited characters. The list contains not characters themselves but their Unicode numbers. For example, Nodeprep specification from RFC-3920 forbids the following codes: {0x22 0x26 0x27 0x2f 0x3a 0x3c 0x3e 0x40} (\" \& \' / : < > @).
Option -prohibitedCommand specifies a command which is called for every character code in mapped and normalized string. If the command returns true then the character is considered prohibited. This option is useful when a list for -prohibitedList is too large.
Option -prohibitedBidi takes boolean value and if it is true then the bidirectional character processing rules defined in section 6 of RFC-3454 are used.
Nameprep profile definition (see RFC-3491):
::stringprep::register nameprep -mapping {B.1 B.2} -normalization KC -prohibited {A.1 C.1.2 C.2.2 C.3 C.4 C.5 C.6 C.7 C.8 C.9} -prohibitedBidi 1
Nodeprep and resourceprep profile definitions (see RFC-3920):
::stringprep::register nodeprep -mapping {B.1 B.2} -normalization KC -prohibited {A.1 C.1.1 C.1.2 C.2.1 C.2.2 C.3 C.4 C.5 C.6 C.7 C.8 C.9} -prohibitedList {0x22 0x26 0x27 0x2f 0x3a 0x3c 0x3e 0x40} -prohibitedBidi 1 ::stringprep::register resourceprep -mapping {B.1} -normalization KC -prohibited {A.1 C.1.2 C.2.1 C.2.2 C.3 C.4 C.5 C.6 C.7 C.8 C.9} -prohibitedBidi 1
Sergei Golovan
This document, and the package it describes, will undoubtedly contain bugs and other problems. Please report such in the category stringprep of the Tcllib SF Trackers [http://sourceforge.net/tracker/?group_id=12883]. Please also report any ideas for enhancements you may have for either package and/or documentation.
unicode(n)
stringprep, unicode
Copyright (c) 2007-2009, Sergei Golovan <sgolovan@nes.ru>
1.0.1 | stringprep |