You can edit almost every page by Creating an account and confirming your email.

Module:UCS/doc

From EverybodyWiki Bios & Wiki

This is the documentation page for Module:UCS

The module “UCS” has the only usable table call that returns a specially formatted table of specified UCS characters.

Usage

{{#invoke: UCS|table|format|list|annotations}}

Parameters

All three currently supported parameters of the table call are positional.

format

Currently ignored but reserved for forward compatibility.

list

Input data, as a sequence of ASCII characters, for building the table. Supported inputs are:

  • +hexadecimal – jump to specified UCS code point, usually (but not necessarily) four hexadecimal digits. Closes the current row if necessary. Default start location is U+0020   SPACE.
  • ! string – name/description the character block, in wiki code. Should not be used where the current row is not finished. The string extends up to newline, so character specifications must start on the next line.
  • Classifiers for exactly one code point:
    • -  (hyphen-minus) – the code point is disallowed; produces a purple empty cell.
    • Basic Latin letters (AZ or az) – the code point is an allowed character and belongs to a specified class: see below. Different classes make cells with different background colors. Class letters are case-insensitive, but lowercase letters make smaller character samples.
  • Newline (0x0A) – close the current table row. A special case is a row that consists of a block description and only one “-”: it produces a pink cell spanning all table width that means that specified code points are disallowed.
  • #, ; , / – a comment that runs up to newline. The difference is that for # the ending line feed is not included in the comment and effects its action, whereas for ; and / the interpreter resumes from the next line as if were not any line feeds.
  • Spaces (0x20) are ignored and likely will be ignored in future versions.

Tabs (0x09) are currently ignored, but may be interpreted in future versions. All other characters may cause errors or be ignored. Support for page transclusion is planned, but not implemented.

If list is omitted or empty, then a hard-coded list is processed that produces a table for ISO 8859-1.

annotations

An optional list of lines that specify location of #-links on characters. Currently only lines of the format

c1c2cn#Anchor_for_internal_link

are supported, that generates #-links on specified characters.

Support for “+” code points, ranges, and other targets (links to the mainspace) is planned, but not implemented.

Character classes

This is an original classification, it does not correspond to Unicode character classes. Classifiers are not stored in the module or some other permanent location, but are extracted from the list argument, so classifiers of the same character in different tables can differ.

  • D – digraphs, ligatures, presentation forms, and other redundant characters. Currently a light gray background.
  • IIPA Extensions and other IPA symbols (except basic Latin). Currently a violet background.
  • Jcombining characters. Currently are yellow on the black background.
  • K, L, MLatin alphabet. Namely, “K” are basic (ASCII) Latin letters, “L” are lesser common letters, and “M” are exotic letters. Currently all have backgrounds in various tones of blue and cyan.
  • Nnumerals. Currently a pale red background.
  • Ocontrol characters, broadly construed. Only characters allowed in HTML are classified here. Currently an orange background.
  • P, Qpunctuation marks, common (in English) and exotic respectively. Currently have backgrounds in shades of green.
  • S, T, Usymbols. Can includes also characters from non-Latin scripts, although most of them are not intended to be shown in tables. Namely, “S” are common symbols, “T” are semigraphics, and “U” are exotic symbols. Currently have backgrounds around yellow, olive, and lime.
  • X – classification is unknown. Includes unallocated code points. Currently an empty (default) background.

Class letters A, B, C, E, F, G, H, R, V, W, Y, Z are currently reserved.

The classification has not a firm base and largely reflects personal tastes of the creator. Namely, a separate class for International Phonetic Alphabet reflects its extensive use in Wikipedia, and there is no sharp criterion to discreet “common” and “exotic” characters. Distinction between “U” (exotic symbols) and “Q” (exotic punctuation) is rather arbitrary and probably somewhere is applied mistakenly.

Examples

Block(s)  0  1  2  3  4  5  6  7  8  9 10
0a
11
0b
12
0c
13
0d
14
0e
15
0f
16
10
17
11
18
12
19
13
20
14
21
15
22
16
23
17
24
18
25
19
26
1a
27
1b
28
1c
29
1d
30
1e
31
1f
U+0030:Decimal digits 0 1 2 3 4 5 6 7 8 9 (skipped)
U+0041:Basic Latin letters A B C D E F G H I J K L M N O P Q R S T U V W X Y Z (skipped)

This module "UCS/doc" is from Wikipedia if otherwise notified