[XeTeX] Clarification on XeTeX documentation
Doug McKenna
doug at mathemaesthetics.com
Thu Dec 12 17:53:00 CET 2019
Two questions:
----------------
Question #1:
In the latest document describing XeTeX extensions, dated 2019-12-09, for instance, at
<https://ctan.math.illinois.edu/info/xetexref/xetex-reference.pdf>,
in section 2.3 "Maths fonts" (currently on page 14), the following sentence needs clarification:
>| In the following commands, ⟨fam.⟩ is a number (0–255) representing
>| font to use in maths. ⟨math type⟩ is the 0–7 number corresponding to
>| the type of math symbol ...
But <fam.> is not a font number (or index). As <fam.> denotes, it is a font family number (or index), where each font family represents a triplet of loaded fonts, one each for text, script, and scriptscript situations.
And throughout other TeX documentation, the word "class" is used to describe the purpose of a math character, a 3-bit number between 0 and 7.
I suggest this be amended to read:
In the following commands, ⟨fam.⟩ is a number (0–255) of the math font family. ⟨math type⟩ is the 0–7 number corresponding to the class of math symbol ...
----------------
Question #2:
Later on, in various syntax declarations, e.g.,
>| \Umathcode⟨char slot⟩ [=] ⟨math type⟩ ⟨fam.⟩ ⟨glyph slot⟩
one finds the term <glyph slot>. This is curious, because XeTeX's source code parses this integer as an integer, using a procedure named scan_usv_num ("usv" stands for Unicode scalar value). That routine complains about any value outside the Unicode range of 0 to "10FFFF as illegal.
But glyph slot is a term usually used to describe the innards of a font, and is not the same as a Unicode character/code point/scalar value, which the font would internally map to a glyph slot (or index). Also, every OpenType font is limited to no more than 2^{16} (65536) glyph slots, so it's concerning that this routine accepts a number that is outside of that range.
If this is the case, another problem is that it is then formally possible that a font contains a glyph whose internal slot number, for example, might be "D800 (a legal 16-bit value that scan_usv_num won't complain about). But "D800 is not a legal Unicode character value, it's a high-surrogate value for forming a full 21-bit Unicode character value with another low surrogate value. "D800 might be a Unicode scalar value, but it is not a character value.
So my question is:
What is a proper legal value for a <glyph slot>? Alternatively, should <glyph slot> be changed in this documentation to something less ambiguous, such as <Unicode character value> or <Unicode code point> or <Unicode scalar value>?
Doug McKenna
Mathemaesthetics, Inc.
More information about the XeTeX
mailing list