Unicode math

Discussion:

Unicode math

Will Robertson

2014-05-20 02:09:35 UTC

Dear all,

I'm writing to continue some recent discussions in other forums on unicode
math and how:
* unicode-math.sty current (incorrectly) deals with it, and
* how we envisage official support for unicode math in LaTeX into the
future.

* * *

In short, here's the issue as I see it.

LaTeX has a default maths font plus families such as \mathrm, \mathit, and
\mathbf to choose new alphabets. These maths fonts are expressly and
strictly only allowed to be text fonts, contrary to their name. This allows
people to write things like \mathit{Re} for Reynolds number and
\mathbf{Set} in category theory (thanks David C for the example).

The \mathbf command in particular has been abused in physics to denote
vectors and matrices, such as \mathbf{B} for magnetic field. I suspect the
situation is similar for sans math, with tensors using sans on occasion but
no doubt in other contexts used for multi-letter identifiers. (Examples
more than welcome; in fact, requested.)

In contrast, Unicode math defines a number of alphabets in a single Unicode
font, including mathematical italic and bold mathematical italic and many
more variations. In OpenType maths fonts to date, these symbols are all
designed as single-letter identifiers and not to be used for strings of
characters such as "Re" in italic or "Set" in bold.

So originally unicode-math simply mapped the unicode alphabets onto the
LaTeX commands (with nice options and so on for choosing your style of bold
and ensuring greek "just works"), and while all the style options and
normalisation were nice and work (IMO) well, the choice of overwriting
\mathbf and so on has led to obvious problems.

* * *

I'd first like to apologise for the inconvenience that unicode-math has
caused up until this point -- I do hope to "fix" it soon, whatever "fix"
means now. The rest of this email is largely a summary of approach taken by
unicode-math and how it can be fixed, expressed in plain (Unicode-)TeX (see
attached and run it through XeTeX; you'll need Linux Libertine installed
but feel free to replace it with any font with both roman and greek text
glyphs).

0. Colours are used to ensure what you're seeing is correct and not a
side-effect of the underlying Plain math machinery.

1. \mathbf and friends go back to simply selecting a text font. Note that
they still need to remap \mathcode{}s in this case because normal unicode
math glyphs exist all the way up in Plane 1 where text fonts daren't to
tread.

2. Greek input into \mathbf and friends does "work" -- if what you were
after were Greek letters from a text font.

3. To get proper bold symbols, including Greek, we'll need a whole new set
of commands. These will need sensible names of some sort. Below I've chosen
\symbf, etc., which doesn't look too bad to me.

4. If how I've implemented unicode-math looks wrong to you, I'd appreciate
suggestions :) Switching \mathcode{}s like this isn't super fun but
* doesn't seem too slow, and
* I'm not aware of anything else flexible and cross-platform enough that
will work instead.

5. I'm largely happy to make a breaking change to unicode-math.sty itself,
but comments on whether an "overwrite" mode which functions as the package
currently does would be sensible as a long term thing would be appreciated.
IMO, it doesn't make much sense to have a separate text font that is only
used for bold math identifiers that aren't real single-letter symbols -- in
such cases it would surely be sensible to use (perhaps a variation on)
\textbf.

Best regards,
Will

\newfam\bfmathfam
\newfam\itmathfam

\font\rm = "Linux Libertine O:color=000000" at 10pt\relax
\font\bf = "Linux Libertine O Bold:color=0000FF" at 10pt\relax
\font\it = "Linux Libertine O Italic:color=00AA00" at 10pt\relax
\font\mm = "[latinmodern-math.otf]:color=FF0000" at 10pt\relax

\textfont\bfmathfam\bf
\textfont\itmathfam\it
\textfont1\mm

\def\alpha{Î±}
\def\beta{Î²}
\def\gamma{Î³}

\def\codemathhigh{%
\Umathcode`\a = 7 1 "1D44E\relax
\Umathcode`\b = 7 1 "1D44F\relax
\Umathcode`\c = 7 1 "1D450\relax
\Umathcode"03B1 = 7 1 "1D6FC\relax
\Umathcode"03B2 = 7 1 "1D6FD\relax
\Umathcode"03B3 = 7 1 "1D6FE\relax
}
\def\codemathlow{%
\Umathcode`\a = 7 1 `\a\relax
\Umathcode`\b = 7 1 `\b\relax
\Umathcode`\c = 7 1 `\c\relax
\Umathcode"03B1 = 7 1 "03B1\relax
\Umathcode"03B2 = 7 1 "03B2\relax
\Umathcode"03B3 = 7 1 "03B3\relax
}
\def\codemathsymbf{%
\Umathcode`\a = 7 1 "1D41A\relax
\Umathcode`\b = 7 1 "1D41B\relax
\Umathcode`\c = 7 1 "1D41C\relax
\Umathcode"03B1 = 7 1 "1D6C2\relax
\Umathcode"03B2 = 7 1 "1D6C3\relax
\Umathcode"03B3 = 7 1 "1D6C4\relax
}

\def\mathit#1{{\codemathlow\fam\itmathfam #1}}
\def\mathbf#1{{\codemathlow\fam\bfmathfam #1}}
\def\symbf#1{{\codemathsymbf #1}}

\parindent=0pt\relax
\hsize=6cm\relax
\hrule
\bigskip

\rm
\codemathhigh

Regular symbols:
$$
a + b + c \quad \alpha + \beta + \gamma
$$
Bold and italic math families:
$$
\mathit{abc\alpha\beta\gamma} \quad \mathbf{abc\alpha\beta\gamma}
$$
Bold math symbols:
$$
a+b+c \quad \symbf{a+b+c}
$$
$$
\alpha + \beta + \gamma \quad \symbf{\alpha + \beta + \gamma}
$$

\bye

Joseph Wright

2014-05-20 07:40:53 UTC

Permalink

Hello Will,

A few questions from me. One 'up front': where does \mathrm fit in to
all of this?

Post by Will Robertson
The \mathbf command in particular has been abused in physics to denote
vectors and matrices, such as \mathbf{B} for magnetic field. I suspect the
situation is similar for sans math, with tensors using sans on occasion but
no doubt in other contexts used for multi-letter identifiers. (Examples
more than welcome; in fact, requested.)

I'm not quite sure what you mean by 'abused' here: there isn't an
obvious alternative to this, particularly if we bear in mind that the
design here pre-dates Unicode by a long way.

Post by Will Robertson
1. \mathbf and friends go back to simply selecting a text font. Note that
they still need to remap \mathcode{}s in this case because normal unicode
math glyphs exist all the way up in Plane 1 where text fonts daren't to
tread.

[snip]

By 'proper' here I assume you mean 'with attached mathematical meaning'?
I think it's fair to say that the LaTeX standard \mathbf does produce
bold symbols, and in the common case of matching text and maths fonts
the symbols also look 'right'.
--
Joseph Wright

David Carlisle

2014-05-20 09:31:06 UTC

Permalink

Post by Joseph Wright
Hello Will,
A few questions from me. One 'up front': where does \mathrm fit in to
all of this?

I think \mathrm is conceptually no different to \mathbf so whatever so
whatever scheme
is used should apply to both.

Post by Joseph Wright
To be clear, the Unicode position is that e.g. bold-B for magnetic field
should not come from the 'bold' font but from the bold-symbols part of a
single maths font: correct? That being the case, have the Unicode people
considered at all multi-letter identifiers or has this simply been
missed at present? (Anyone on the list sufficiently well-informed about
this?)

I think that it is a mistake to look at unicode this way. Despite the
appearance of "Unicode fonts"
It's a standard primarily of _input characters_ for _plain text_ So the
fact that there may or may not
be particular characters in the Unicode math alphabet block isn't really
of direct concern any more
than the fact that there isn't an ffi ligature means that we shouldn't
typeset an ffi ligature.
Unicode just doesn't tell you whether f f i should be typeset as one two
or three glyphs, and it
doesn't tell you what font to use for typesetting bold math. (The tables
in the font, once you have
chosen a font, say something, but that's a different matter).

That said, there are fonts that have useful glyphs in those positions
and so clearly there should be a
latex interface to access those.

Post by Joseph Wright

Yes it isn't clear to me that any document would ever want both \mathbf
for multi-letter
identifiers and \symbf for single symbols. If the fonts use a matching
design probably you just
need \mathbf, and if the fonts don't use a matching design I think it
would be better for consistency
if you'd just use \mathbf as well.

However sometimes there isn't a matching bold font, but there is (for a
limited character range)
a set of bold glyphs in the base math font in the bold math alphabet
range. For that use I'd think
a variant declaration which would define \mathbf to flip the mathcodes
into the U+1Dxxx block
using the base font rather than define it as a swutch to new \fam
(\mathgroup) and would be useful.

This latter mechanism could probably be a default for things like
blackboard bold and calligraphic where
you can't find default fonts by looking at the text font settings.

But defining \symcal first as a name that always means the unicode block
and then having top level
options that do \let\mathcal\symcal rather than defining \mathcal
directly would also work (and be more
flexible in that it gives top level access to both if both are
available people really need that.
So I'm happy to run with Will's proposal to see where it leads....

David

William F Hammond

2014-05-20 19:16:54 UTC

Permalink

Post by David Carlisle
For that use I'd think
a variant declaration which would define \mathbf to flip the mathcodes
into the U+1Dxxx block
using the base font rather than define it as a swutch to new \fam
(\mathgroup) and would be useful.

--
William F Hammond
Email: gellmu-***@public.gmane.org
https://www.facebook.com/william.f.hammond
http://www.albany.edu/~hammond/

Barbara Beeton

2014-05-20 20:08:13 UTC

Permalink

So long as one minds the gaps in U+1Dxxx (actually the several gaps for
which the unicode folk seem to have thought the characters were previously
defined in the U+21xx block), though maybe it's not that much of an issue
for \mathbf itself as opposed to \mathcal, \mathfrak, and \mathbb.

At least they were thoughtful enough to leave those slots empty. :-)

but there's one pseudo-overlap:
1D48D (italic "ell") is often
replaced or substituted by 2113
(the curly "ell") by meticulous
authors.

and it could be claimed that the
power set, which was originally
(in the unicode 2 manual) listed
as a meaning for 2118, hasn't
been properly accommodated, since
the shape shown in version 2 was
obviously the weierstrass p (now
corrected), and the shape of the
script P at 104AB is not suitable
for use as the power set. but all
the slots in the 2100-214F block
have been filled in, so the power
set has been effectively excluded.
(i'll pursue that with the utc.)

Just seizing the opportunity to make everyone aware of the gaps.

all is certainly not perfect.
but whoever is creating the latex
xupport for unicode fonts should
"do the right thing" so the users
don't have to be concerned in
most cases.
-- bb

William F Hammond

2014-05-21 20:06:38 UTC

Permalink

I'm catching up on this one.

Post by Barbara Beeton
1D48D (italic "ell") is often
replaced or substituted by 2113
(the curly "ell") by meticulous
authors.

U+2113 (\ell ℓ), is actually, I think, a character
unto itself like the Weierstrass 'p'. (I'm thinking
that it was on one of the ibm selectric balls in the
1960s, but I'm not sure about that.)

-- Bill

Chris Rowley

2014-05-20 20:58:41 UTC

Permalink

"Mind the Gap!" as they say on The Tube in London (the UK one).

Sent from my iPud (\~/)

Post by David Carlisle
For that use I'd think
a variant declaration which would define \mathbf to flip the mathcodes into the U+1Dxxx block
using the base font rather than define it as a swutch to new \fam (\mathgroup) and would be useful.

Will Robertson

2014-05-21 10:55:54 UTC

Permalink

Aye, this made the implementation far more messy than it could have been.
Unicode-math has quite a number of exceptions hard-coded with their own csnames, such as:

\usv_set:nnn {bb}{C}{"2102}
\usv_set:nnn {bb}{H}{"210D}
\usv_set:nnn {bb}{N}{"2115}
\usv_set:nnn {bb}{P}{"2119}
\usv_set:nnn {bb}{Q}{"211A}
\usv_set:nnn {bb}{R}{"211D}
\usv_set:nnn {bb}{Z}{“2124}

The vargreek symbols in particular are in inconsistent locations, but there’s only a few of them so it’s not the end of the world.

Cheers,
Will

Barbara Beeton

2014-05-20 13:19:30 UTC

Permalink

On Tue, 20 May 2014, Joseph Wright wrote:

[...]

To be clear, the Unicode position is that e.g. bold-B for magnetic field
should not come from the 'bold' font but from the bold-symbols part of a
single maths font: correct? That being the case, have the Unicode people
considered at all multi-letter identifiers or has this simply been
missed at present? (Anyone on the list sufficiently well-informed about
this?)

having been the stix representative to
the unicode technical committe, i hope
that i'm sufficiently well-informed.

the unicode people didn't consider
multi-letter identifiers specifically,
since only single letters are (normally)
given character status. as pointed out
(i think by david), one goal of the
alphanumeric block in plane 1 was to
be able to drop single characters into
text and have them recognized as math
identifiers (one of the "math subgroup"
was murray sargent of microsoft, who
has been responsible for the ms work
adding math to office). another
explicit goal was to be able to search
for individual math expressions by
unicode to find in what documents a
particular identifier had been used.
explicitly *un*intended was the
ability to easily use, say, fraktur
or script for wedding invitations,
hence the location in plane 1.

the unicode goal is to have only one
code per "meaning". hence the absence
of the "usual version" of an alphabet
(usually upright lightface) from the
plane 1 math block. (the absence of
lightface sans greek is an oversight;
this has been resubmitted, with a
reference to nist special publication 811,
http://physics.nist.gov/cuu/pdf/sp811.pdf
page 22, where lightface sans is used,
though not by name, in the definition
of the *dimensions* of si base quantities.
one greek letter, theta, is shown; not
sure whether the theta is upper- or
lowercase, but it's the principle
that's important to the utc.) more
information (and history regarding the
deciding example that resulted in the
inclusion of the plane 1 alphanumeric
block) is given in unicode tech report #25:
http://www.unicode.org/reports/tr25/

regarding identifiers, see utr#25, in
particular sections 2.16 and 4.4. (the
latter section does strongly hint that
the characters in the plane 1 block can
be used for multi-letter identifiers.)

hope this is helpful.
-- bb

Chris Rowley

2014-05-20 14:16:48 UTC

Permalink

I could just say:

I agree, as ever:-), with David

But that would imply that I have read all this!!

As a user of multi-letter math symbols on strange fonts, such as 'upper case sans serif Demi-bold' I am happy with the Classic TeX/LaTeX methods for getting what I want and feel that the LaTeX version \mathsfdbf ?? captures well the visual or presentation semantics. Also, as a mathematician I do not think that the Plane 1 math letters have any semantics other than this type of 'visual meaning'. [ For physicists they will have physical semantics, but inconsistently and with multiplicities of meaning in different application areas.] The Unicode philosophy and logic are purely the invention of a group of computing and standards people who pretend to understanding how math and science people use math language and written notation; thus they have no relevance to me as a math person BUT there
work is both painstaking and excellent as informatics and of great use to my other hats.

And I will look at it all in more detail RSN.

Chris

Sent from my iPud (\~/)

Post by Joseph Wright
[...]

To be clear, the Unicode position is that e.g. bold-B for magnetic field
should not come from the 'bold' font but from the bold-symbols part of a
single maths font: correct? That being the case, have the Unicode people
considered at all multi-letter identifiers or has this simply been
missed at present? (Anyone on the list sufficiently well-informed about
this?)
having been the stix representative to
the unicode technical committe, i hope
that i'm sufficiently well-informed.
the unicode people didn't consider
multi-letter identifiers specifically,
since only single letters are (normally)
given character status. as pointed out
(i think by david), one goal of the
alphanumeric block in plane 1 was to
be able to drop single characters into
text and have them recognized as math
identifiers (one of the "math subgroup"
was murray sargent of microsoft, who
has been responsible for the ms work
adding math to office). another
explicit goal was to be able to search
for individual math expressions by
unicode to find in what documents a
particular identifier had been used.
explicitly *un*intended was the
ability to easily use, say, fraktur
or script for wedding invitations,
hence the location in plane 1.
the unicode goal is to have only one
code per "meaning". hence the absence
of the "usual version" of an alphabet
(usually upright lightface) from the
plane 1 math block. (the absence of
lightface sans greek is an oversight;
this has been resubmitted, with a
reference to nist special publication 811,
http://physics.nist.gov/cuu/pdf/sp811.pdf
page 22, where lightface sans is used,
though not by name, in the definition
of the *dimensions* of si base quantities.
one greek letter, theta, is shown; not
sure whether the theta is upper- or
lowercase, but it's the principle
that's important to the utc.) more
information (and history regarding the
deciding example that resulted in the
inclusion of the plane 1 alphanumeric
http://www.unicode.org/reports/tr25/
regarding identifiers, see utr#25, in
particular sections 2.16 and 4.4. (the
latter section does strongly hint that
the characters in the plane 1 block can
be used for multi-letter identifiers.)
hope this is helpful.
-- bb

Will Robertson

2014-05-20 05:21:07 UTC

Permalink

On Tue, May 20, 2014 at 11:39 AM, Will Robertson <wspr81-***@public.gmane.org> wrote:

1. \mathbf and friends go back to simply selecting a text font. Note that

Post by Will Robertson
they still need to remap \mathcode{}s in this case because normal unicode
math glyphs exist all the way up in Plane 1 where text fonts daren't to
tread.

I see it's been a while since I looked at this (and shouldn't write these
emails in a rush) -- if the class of each math char is set to 7 then the
remapping mentioned above to enable old-style \mathbf etc is not necessary.
This is better.

Updated plain example below.

Cheers,
Will

\font\rm = "Linux Libertine O:color=AA00AA" at 10pt\relax
\font\bf = "Linux Libertine O Bold:color=0000FF" at 10pt\relax
\font\it = "Linux Libertine O Italic:color=00AA00" at 10pt\relax
\font\mm = "[latinmodern-math.otf]:color=FF0000" at 10pt\relax

\newfam\rmmathfam
\newfam\bfmathfam
\newfam\itmathfam

\textfont\rmmathfam\rm
\textfont\bfmathfam\bf
\textfont\itmathfam\it
\textfont1\mm

\def\alpha{Î±}
\def\beta{Î²}
\def\gamma{Î³}

\def\codemathhigh{%
\Umathcode`\a = 7 1 "1D44E\relax
\Umathcode`\b = 7 1 "1D44F\relax
\Umathcode`\c = 7 1 "1D450\relax
\Umathcode"03B1 = 7 1 "1D6FC\relax
\Umathcode"03B2 = 7 1 "1D6FD\relax
\Umathcode"03B3 = 7 1 "1D6FE\relax
}
\def\codemathsymup{%
\Umathcode`\a = 7 1 `\a\relax
\Umathcode`\b = 7 1 `\b\relax
\Umathcode`\c = 7 1 `\c\relax
\Umathcode"03B1 = 7 1 "03B1\relax
\Umathcode"03B2 = 7 1 "03B2\relax
\Umathcode"03B3 = 7 1 "03B3\relax
}
\def\codemathsymbf{%
\Umathcode`\a = 7 1 "1D41A\relax
\Umathcode`\b = 7 1 "1D41B\relax
\Umathcode`\c = 7 1 "1D41C\relax
\Umathcode"03B1 = 7 1 "1D6C2\relax
\Umathcode"03B2 = 7 1 "1D6C3\relax
\Umathcode"03B3 = 7 1 "1D6C4\relax
}

\def\mathrm#1{{\fam\rmmathfam #1}}
\def\mathit#1{{\fam\itmathfam #1}}
\def\mathbf#1{{\fam\bfmathfam #1}}
\def\symbf#1{{\codemathsymbf #1}}
\def\symup#1{{\codemathsymup #1}}

\parindent=0pt\relax
\hsize=6cm\relax
\hrule
\bigskip

\rm
\codemathhigh

Regular symbols:
$$
a + b + c \quad \alpha + \beta + \gamma
$$
Roman, italic, and bold math families:
$$
\mathrm{abc\alpha\beta\gamma} \quad \mathit{abc\alpha\beta\gamma} \quad
\mathbf{abc\alpha\beta\gamma}
$$
Bold math symbols:
$$
\symbf{a+b+c} \quad \symbf{\alpha + \beta + \gamma}
$$
Upright math symbols:
$$
\symup{a+b+c} \quad \symup{\alpha + \beta + \gamma}
$$

\bye

David Carlisle

2014-05-20 09:40:24 UTC

Permalink

On 20/05/2014 06:21, Will Robertson wrote:

Thanks for the file (and sorry for being grumpy in the github issues:-)

I had a bit of trouble getting it to run so I thought I'd ask about those
first (rather than about the actual proposal) in case others are having same
issues, and to check we are seeing the same result in the end.

Using an up to date texlive 2014 pretest on windows/cygwin

Taking your second version I got

! Font \rm="Linux Libertine O:color=AA00AA" at 10.0pt not loadable:
Metric (TFM

The font name lookups always seem a black art:-)
Anyway I copied the contents of
/usr/local/texlive/2014/texmf-dist/fonts/opentype/public/libertine

into the system /windows/fonts directory and then it picked up the fonts.

As posted the second display headed by "Roman, italic, and bold math
families:"
made three blocks of coloured missing glyph markers. Was that expected?

Missing character: There is no 𝑎 in font Linux Libertine O:color=AA00AA
where that alpha is

U+1d44e MATHEMATICAL ITALIC SMALL A

commenting out

\codemathhigh

brought them back down so they displayed in Linux Libertine.

David

Ulrike Fischer

2014-05-20 12:24:03 UTC

Permalink

Post by David Carlisle
As posted the second display headed by "Roman, italic, and bold math
families:"
made three blocks of coloured missing glyph markers. Was that expected?
Missing character: There is no �� in font Linux Libertine O:color=AA00AA
where that alpha is

I get this too. In miktex I see the correct glyphs but nevertheless
get a lot of missing chars messages.

Similar with luatex. There I get additionally a lot of a bit dubious
looking messages like

luaotfload | fonts : skipping cyclic reference U+00028 in math
variant U+00028 :
luaotfload | fonts : skipping cyclic reference U+00029 in math
variant U+00029 :

when the latinmodern-math-cache is created by luaotfload.

Will's first example works fine, there are no missing chars but if I
delete the cache I get there the "cyclic reference" message too.

--
Ulrike Fischer
http://www.troubleshooting-tex.de/

Will Robertson

2014-05-25 23:09:51 UTC

Permalink

Hi David,

Post by David Carlisle
I had a bit of trouble getting it to run so I thought I'd ask about those
first (rather than about the actual proposal) in case others are having same
issues, and to check we are seeing the same result in the end.

Confession time: I havent had a working LuaTeX since I moved everything to a newer computer and only just managed to install a pretest TeX Live 2014.

I can now see what the problem is: there has been a change in XeTeXs maths handling so that remapping back to ascii for class 7 glyphs is indeed necessary after all. I have to admit I was extremely surprised when it worked the other day without the remapping.

Post by David Carlisle
As posted the second display headed by "Roman, italic, and bold math families:"
made three blocks of coloured missing glyph markers. Was that expected?

No it worked in TeX Live 2013, but not in 2014 :)

* * *

Ive attached a now-working example below; it also runs in plain LuaTeX but for some reason the colours dont show up in maths.

Hope this helps,
Will

David Carlisle

2014-05-20 13:50:06 UTC

Permalink

One thing I meant to say, is if \mathsf{abc} is implemented by
switching to a sans font.
What to do about

$ U+1D5BA U+1D5BB U+1D5BC $

that is

MATHEMATICAL SANS-SERIF SMALL [ABC]

The choices (either justifiable) would be to let them go through to the
sans math alphabet in the base font
or to give them mathcodes with ascii codepoints and the \fam of the sans
serif font so that it typeset
as \mathsf{abc}

Unlike the switching in the other direction this would not need a
mathcode switching loop in every instance
you could just set the mathcodes once, at the point the mathsf family
was allocated.

Switching to use a text font is I think closest to the intended spirit
of these characters and their use in MathML.

<mi mathvariant="sans-serif">abc</mi>

is defined to be equivalent to

<mi>𝖺𝖻𝖼</mi>

and distinct from

<mi>𝖺</mi><mi>𝖻</mi><mi>𝖼</mi>

That is it's defined to be distinct although whether or not it is
visually distinct naturally depends
on the fonts in use.

Thus MathML (at least) expects to be able to use runs of plane 1
characters as multi-letter identifiers
and have them typeset as such, it doesn't mandate (or disallow) that the
characters go straight through
to font slots with that index.

David

Bruce Miller

2014-05-21 20:21:01 UTC

Permalink

Post by David Carlisle
One thing I meant to say, is if \mathsf{abc} is implemented by
switching to a sans font.
What to do about
$ U+1D5BA U+1D5BB U+1D5BC $
that is
MATHEMATICAL SANS-SERIF SMALL [ABC]

I've been eavesdropping, and what I'm saying is
essentially what others have said in various ways,
but seems worth reiterating.

To _me_ it seems rather fundamental that the various
\mathsf, etc, not only request a family/shape/slant/whatever,
but represent a single mathematical token.
The spacing around it would be appropriate for ORD (normally, I guess),
but within, the contents are essentially text (kerned, ligature, etc),
not treated as a sequence of math symbols, independently of whether the
eventual font turns out to be a "math" font.

So, while David's 2 examples might very well choose the same
font, I would expect the spacing around & between to be completely
different.

bruce

David Carlisle

2014-05-20 10:55:10 UTC

Permalink

Post by Will Robertson
Dear all,
I'm writing to continue some recent discussions in other forums on
unicode math and how: * unicode-math.sty current (incorrectly) deals
with it, and * how we envisage official support for unicode math in
LaTeX into the future.
* * *
In short, here's the issue as I see it.
LaTeX has a default maths font plus families such as \mathrm,
\mathit, and \mathbf to choose new alphabets. These maths fonts are
expressly and strictly only allowed to be text fonts, contrary to
their name. This allows people to write things like \mathit{Re} for
Reynolds number and \mathbf{Set} in category theory (thanks David C
for the example).

I think the "contrary to their name" requires some comment (since it
sort of relates
to what these commands do and whether they should be changed)

I think \mathbf selecting a text font looks odd to _you_ because you are
approaching it
from the implementation point of view and looking at the math font
tables etc.

At the use level the command names are more to do with use than
implementation.
\mathbf selects a bold font in math mode. hence its name.
The exact font it selects is something of an implementation detail and
the fact that it is
technically a text font is no more strange than in the default setup
$ 1 + a $
the 1 comes from a roman text font and a comes from a math italic font.

If you look at the plain tex definitions of \bf and \it they are two
definitions packed in to one

\def\it{\fam\itfam\tenit}

with \fam\itfam just working in math mode and \tenit just working in
text mode.

\mathit is just the math part of \it
\textit is just the text part of \it (with an extra \hbox in math mode
to ensure you are in text mode.)

So I think the \mathxx names have caused more confusion amongst font
package authors than they have for users...

Post by Will Robertson
The \mathbf command in particular has been abused in physics to
denote vectors and matrices, such as \mathbf{B} for magnetic field. I
suspect the situation is similar for sans math, with tensors using
sans on occasion but no doubt in other contexts used for multi-letter
identifiers. (Examples more than welcome; in fact, requested.)

I don't think that is abuse, it was the intended use.

Multi-letter identifiers are common in category theory (and often with
different fonts for different categories)
other areas where they are common is math mode used in other fields,
notably computer science where
variable names in pseudo code often match the variable names used in the
real code, where single letters
are frowned on.
grepping the tl2014 tree shows things such as

pbsheet/pbsheet.cls:\newcommand{\covf}[1]{\mathbf{Cov}_{#1}}
zed-csp/zed-csp.sty:\def \ELSE {\mathrel{\mathbf{else}}}

Post by Will Robertson
In contrast, Unicode math defines a number of alphabets in a single
Unicode font, including mathematical italic and bold mathematical
italic and many more variations. In OpenType maths fonts to date,
these symbols are all designed as single-letter identifiers and not
to be used for strings of characters such as "Re" in italic or "Set"
in bold.
So originally unicode-math simply mapped the unicode alphabets onto
the LaTeX commands (with nice options and so on for choosing your
style of bold and ensuring greek "just works"), and while all the
style options and normalisation were nice and work (IMO) well, the
choice of overwriting \mathbf and so on has led to obvious problems.

These work for the somewhat idiosyncratic character ranges in the math
alphabet block,
but gives you bold Greek but not bold Cyrillic or a bold aleph
and gives bold digits 0-9 but not italic digits, etc.
Also there is a bold roman not not a standard roman typeface so you
still need the traditional switch for \mathrm

Post by Will Robertson
* * *
I'd first like to apologise for the inconvenience that unicode-math
has caused up until this point -- I do hope to "fix" it soon,
whatever "fix" means now. The rest of this email is largely a summary
of approach taken by unicode-math and how it can be fixed, expressed
in plain (Unicode-)TeX (see attached and run it through XeTeX; you'll
need Linux Libertine installed but feel free to replace it with any
font with both roman and greek text glyphs).
0. Colours are used to ensure what you're seeing is correct and not a
side-effect of the underlying Plain math machinery.
1. \mathbf and friends go back to simply selecting a text font. Note
that they still need to remap \mathcode{}s in this case because
normal unicode math glyphs exist all the way up in Plane 1 where text
fonts daren't to tread.
2. Greek input into \mathbf and friends does "work" -- if what you
were after were Greek letters from a text font.
3. To get proper bold symbols, including Greek, we'll need a whole
new set of commands. These will need sensible names of some sort.
Below I've chosen \symbf, etc., which doesn't look too bad to me.

Or for font families that have a bold font (as required for \boldmath)
you could support
a much larger collection of bold characters with proper math sidebearings
by switching to that family. (rather as \bm package does, but that has
to take care not
to run out of fam by simply loading normal and bold weight of
everything) but
here you need less fam (as each holds a lot more symbols) and you have
access
to a lot more of them as you get 255 rather than 16 (I think:-)

Post by Will Robertson
4. If how I've implemented unicode-math looks wrong to you, I'd
appreciate suggestions :) Switching \mathcode{}s like this isn't
super fun but * doesn't seem too slow, and * I'm not aware of
anything else flexible and cross-platform enough that will work
instead.

I think it is a useful technique to access the math alphabet block if
there is no alternative,
but I think it is required a lot less than your message indicates.

One thing I note is that if you just change mathcodes then explicit
character tokens are affected
but not \mathchardef tokens or characters accessed by \mathchar.

I note you defined \alpha as

\def\alpha{Î±} rather than as

\mathchardef\alpha="010B

for this reason. Perhaps that's no bad thing and we should deprecate
\mathchardef
as an optimisation not needed this century, it also has the benefit that
\alpha works like Î±
and can be used in text as well as math.
But it is quite a big change (and would have a knock on effect on
packages that set up
new math fonts which would need clear guidelines on which commands to
use....

It is possible to make chardef etc work as in \bm by inspecting each
token and modifying the codes in place,
but that requires doing an expansion pass over the whole expression to
expose the primitive \chardef values,
which is slow and fragile compared to just looping through the mathcodes
at the start.
But people do use bm quite a bit and expect \bm{arbitrary-stuff) to make
stretch delimiters and symbols etc bold,
so a kind of bold that can not be done using \mathbf to a text bold font
or by mapping to that bold math alphabet range.

I think when people do not want \mathbf and ask for a bold symbol,
they often want something more like \bm
that makes all symbols bold. for various reasons bm doesn't work at all
in unicode engines at the moment
It doesn't know about the extended mathchar possibilities (easily
fixable) and is incompatible with unicode-math
(I was waiting to see what you'd do:-)

David

Post by Will Robertson
5. I'm largely happy to make a breaking change to unicode-math.sty
itself, but comments on whether an "overwrite" mode which functions
as the package currently does would be sensible as a long term thing
would be appreciated. IMO, it doesn't make much sense to have a
separate text font that is only used for bold math identifiers that
aren't real single-letter symbols -- in such cases it would surely be
sensible to use (perhaps a variation on) \textbf.
Best regards, Will

Ulrike Fischer

2014-05-20 14:04:06 UTC

Permalink

Post by Will Robertson
3. To get proper bold symbols, including Greek, we'll need a whole new set
of commands. These will need sensible names of some sort. Below I've chosen
\symbf, etc., which doesn't look too bad to me.
used for bold math identifiers that aren't real single-letter symbols -- in
such cases it would surely be sensible to use (perhaps a variation on)
\textbf.

Imho math fonts should be fix fonts, they should not like \textbf
switch one aspect of the "current" font but always to a fix, well
defined font.

Also chars should not only look ok but in the days of unicode also
have the code point with the correct meaning. That means that a bold
"T" in math should if possible be the one from the math plane in
unicode and not a bold "T" from some textfont -- even if they look
the same.

So in my opinion the current \mathbf-etc setup in unicode-math
actually did the right thing and improved the standard
\math-commands. I wouldn't like to loose this completly. If \mathbf
pointed to a textfont then everyone who wants the real math symbols
would have to replace \mathbf in their code by \symbf. And back
again if he wants to use a text font.

Wouldn't it be possible to have a "\usetextfontasmathbf..." command
which disables the mapping to the math plane? So that one doesn't
have to switch between \symbf and \mathbf depending on the font
setup of a document?

--
Ulrike Fischer
http://www.troubleshooting-tex.de/

David Carlisle

2014-05-20 14:51:58 UTC

Permalink

Post by Ulrike Fischer

That would be good to have but may be hard with currently available fonts.

If you want \mathbf{Var} to "look OK" then you need to kern the V and a
and that means (as far as I understand the current situation) either using a
text bold font or having tables somewhere in the macro layer and inserting
kerns "by hand", that is: make the macros iterate over every character
inserting
kerns where needed. (but maybe luatex and or xetex can modify the font
metrics
at font loading time?

Post by Ulrike Fischer
So in my opinion the current \mathbf-etc setup in unicode-math
actually did the right thing and improved the standard
\math-commands. I wouldn't like to loose this completly. If \mathbf
pointed to a textfont then everyone who wants the real math symbols
would have to replace \mathbf in their code by \symbf. And back
again if he wants to use a text font.
Wouldn't it be possible to have a "\usetextfontasmathbf..." command
which disables the mapping to the math plane? So that one doesn't
have to switch between \symbf and \mathbf depending on the font
setup of a document?

Yes as I said in my reply to Will I can't see anyone is going to want
\mathbf and
\symbf in the same document so I would expect that the document markup was
always \mathxx with that being one definition or the other depending on
document
wide settings: either package options or a package for a particular font
family just
making a choice of definition to get the best coverage for that family.

David

Barbara Beeton

2014-05-20 15:50:10 UTC

Permalink

On Tue, 20 May 2014, David Carlisle wrote:

[...]

If you want \mathbf{Var} to "look OK" then you need to kern the V and a
and that means (as far as I understand the current situation) either using a
text bold font or having tables somewhere in the macro layer and inserting
kerns "by hand", that is: make the macros iterate over every character
inserting
kerns where needed. (but maybe luatex and or xetex can modify the font
metrics
at font loading time?

i think maybe the "proper" thing
to do is recommend declaring these
multi-letter non-\mathrm strings
as a special kind of operator name.
may require a new command ...

i agree with david that the text
kerning is definitely wanted here.
-- bb

Chris Rowley

2014-05-20 21:34:04 UTC

Permalink

Post by Barbara Beeton
i think maybe the "proper" thing
to do is recommend declaring these
multi-letter non-\mathrm strings
as a special kind of operator name.

My personal logic as a user of such operators agrees largely with this.

But more simply, I think of them as something like:

\mathoperator {\mathtext {\textbf {Var}}}

Giving the new command:

\mathtextopeator

And the declaration:

\decaremathtextoperatorname

But we may need to distinguish operators from variables. The variable case would include the case of 'variable names' that include spaces (I find it difficult to imagine an operator name with spaces but who knows?)

The rm case is then, as ever, special as the \mathtext etc is not needed unless spaces are used, but they do no harm. But is there not also a need for 'unkerned regular roman text'?

More generally, there may well be some 'multiple-character' math objects that should not look like words (eg no kerning and certainly no ligatures!).

Chris

PS: the term 'identifier' is not, at least not until quite recently, ever used by math folk.

William F Hammond

2014-05-21 05:00:39 UTC

Permalink

Post by Chris Rowley
\mathoperator {\mathtext {\textbf {Var}}}

Yes. Also when Var is 'R', I often prefer this to \mathbb{R} as an
'indicator' for the field of real numbers. Yes, the word 'indicator' seems
to have first appeared chez MathML or perhaps chez OpenMath [it
can be hard to tell the difference :-) ]

I think the MathML distinction between 'indicator' and 'operator' has no
precise analogue in LaTeX. Mathematically, in light of what the category
theorists have shown us, particularly as applied, for example, to what is
called the "functor of points" in algebraic geometry, one can think of
almost everything as a categorical "arrow". To make this mundane, if x is
in n-dimensional space and f a map defined there, the notation fx [ or
f(x) ]
can be regarded as a composition of arrows whether x is static or whether
it depends on some parameter t since a static x, i.e., a point, may be
canonically identified with the map from the one-point space to
n-dimensional space taking x as its value and likewise for f(x). (To get
mathematical semantics out of this one just needs to understand the "type"
of each symbol.)

-- Bill

--
William F Hammond
Email: gellmu-***@public.gmane.org
https://www.facebook.com/william.f.hammond
http://www.albany.edu/~hammond/

Barbara Beeton

2014-05-21 12:29:29 UTC

Permalink

On Tue, 20 May 2014, Chris Rowley wrote:

[...]

But we may need to distinguish operators from variables. The variable case would include the case of 'variable names' that include spaces (I find it difficult to imagine an operator name with spaces but who knows?)

counter-examples from the texbook:
\liminf (lim\,inf) and \limsup (lim\,sup).

[...]

PS: the term 'identifier' is not, at least not until quite recently, ever used by math folk.

not until some of us had to get into
the font business. sorry 'bout that.
-- bb

Christopher Rowley

2014-05-24 12:27:38 UTC

Permalink

Post by Chris Rowley
[...]
But we may need to distinguish operators from variables. The variable case would include the case of 'variable names' that include spaces (I find it difficult to imagine an operator name with spaces but who knows?)
\liminf (lim\,inf) and \limsup (lim\,sup).

Ah yes (not that I ever wrote them with a space way back in the days when I wrote (yes, on paper) such things). But are those 'word spaces'??

Post by Chris Rowley
[...]
PS: the term 'identifier' is not, at least not until quite recently, ever used by math folk.
not until some of us had to get into
the font business. sorry 'bout that.

Nothing for you to apologise for, blame the takeover by computing anoraks:-).

Chris

Will Robertson

2014-05-21 10:49:37 UTC

Permalink

Hi Ulrike et al,

Post by Ulrike Fischer
So in my opinion the current \mathbf-etc setup in unicode-math
actually did the right thing and improved the standard
\math-commands.

I’m replying out of order, but I’m still inclined to agree with you here :)
The big problem was not handling \mathit properly.

Post by Ulrike Fischer
I wouldn't like to loose this completly. If \mathbf
pointed to a textfont then everyone who wants the real math symbols
would have to replace \mathbf in their code by \symbf. And back
again if he wants to use a text font.
Wouldn't it be possible to have a "\usetextfontasmathbf..." command
which disables the mapping to the math plane? So that one doesn't
have to switch between \symbf and \mathbf depending on the font
setup of a document?

It has been possible for a long time to select a text font for a math alphabet in unicode-math, but this feature was probably not documented very well.
If you try to select a particular unicode range such as \mathbfup and a font simply doesn’t have it (well, it only checks “A” I think), the remapping doesn’t occur and you get the ascii-range glyphs:

\documentclass{article}
\usepackage{fontspec}
\usepackage{unicode-math}
\setmainfont{texgyretermes-regular.otf}
\setmathfont[range=\mathbfup]{texgyreheros-bold.otf}
\begin{document}
text \textbf{bold}
\[ m+a+t+h \quad \mathbf{b+o+l+d} \]
\end{document}

BUT this doesn’t work properly with \mathit, because unicode-math hasn’t distinguished “math alphabetic symbols” from “math text font”.

Cheers,
Will

David Carlisle

2014-05-21 11:05:32 UTC

Permalink

Post by Will Robertson
Hi Ulrike et al,

Post by Ulrike Fischer
So in my opinion the current \mathbf-etc setup in unicode-math
actually did the right thing and improved the standard
\math-commands.

Im replying out of order, but Im still inclined to agree with you here :)
The big problem was not handling \mathit properly.

or at least the problem is more apparent for italic as the differences
between
math and text setting are more glaring in that case:-)

Post by Will Robertson

It has been possible for a long time to select a text font for a math alphabet in unicode-math, but this feature was probably not documented very well.

Yes, although I think what's needed is an explicit way to do this rather
than relying on heuristics
for example while answering a tex.sx question I wanted to use rsfs (or
euler) for script in addition or instead of
the script in stix because well just because that's what the question
asked for, and it seemed that the easiest way
currently is to use|\||mathup{\euler{A B C| to disable the mathcode
mapping which works but looks a bit odd.

David

Will Robertson

2014-05-21 12:41:32 UTC

Permalink

On 20 May 2014, at 11:34 pm, Ulrike Fischer

Post by Ulrike Fischer
So in my opinion the current \mathbf-etc setup in unicode-math
actually did the right thing and improved the standard
\math-commands.

I’m replying out of order, but I’m still inclined to agree with you here :)
The big problem was not handling \mathit properly.

or at least the problem is more apparent for italic as the differences between
math and text setting are more glaring in that case:-)

Well, I think from the foregoing discussing (correct me if I’m wrong on this!) that we all roughly agree that a suitable OpenType math font with bold glyphs in plane 1 will still be a sensible default for \mathbf, albeit with an obvious override possible (both as a package option and at math-font-load time) when a “text” bold font is desired instead for multi-letter identifiers.

\mathit is just wrong as it currently stands.

It has been possible for a long time to select a text font for a math alphabet in unicode-math, but this feature was probably not documented very well.

Yes, although I think what's needed is an explicit way to do this rather than relying on heuristics

Agreed for sure.

for example while answering a tex.sx question I wanted to use rsfs (or euler) for script in addition or instead of
the script in stix because well just because that's what the question asked for, and it seemed that the easiest way
currently is to use \mathup{\euler{A B C to disable the mathcode mapping which works but looks a bit odd.

Most certainly, this is not what we should be asking users to do.
This (arbitrary \mathXYZ alphabet support) was always in the works but time got away from me.

Cheers,
Will

David Carlisle

2014-05-21 13:05:50 UTC

Permalink

Post by Will Robertson

On 20 May 2014, at 11:34 pm, Ulrike Fischer

Post by Ulrike Fischer
So in my opinion the current \mathbf-etc setup in unicode-math
actually did the right thing and improved the standard
\math-commands.

I’m replying out of order, but I’m still inclined to agree with you here :)
The big problem was not handling \mathit properly.

or at least the problem is more apparent for italic as the differences between
math and text setting are more glaring in that case:-)

I think it should be available as a possibility, but not as a default
(and not required very often).

You need to use a text font for multi-letter identifier case and it
would look odd if single and
two letter identifiers were coming from separate fonts,
so most of the time you need to use the same font for the single letter
case.

If a package is setting up a font family for which there is no natural
bold font,
and for which the base font has bold glyphs in the unicode slots
then obviously it makes sense to use that as the default for that font
family.

Or if an end user knows that in a particular document that \mathbf is
only used
with single letter arguments and wants to use the base font, there
should be an
easy way to select that, but this has to be a user option given the
usage in a particular
document, so not a default.

Post by Will Robertson
\mathit is just wrong as it currently stands.

It has been possible for a long time to select a text font for a math alphabet in unicode-math, but this feature was probably not documented very well.

Yes, although I think what's needed is an explicit way to do this rather than relying on heuristics

Agreed for sure.

Most certainly, this is not what we should be asking users to do.
This (arbitrary \mathXYZ alphabet support) was always in the works but time got away from me.
Cheers,
Will

Yes I didn't mean that that was ideal markup (and I may have missed
something better in the existing code)
but for a tex.sx answer where I didn't want to be redefining internals
it seemed at least an approach that
I could give some explanation of why it worked.

I just mentioned it as it seemed relevant., in that the functionality
was requested and is, as you say,
basically already available in unicode-math.

http://tex.stackexchange.com/questions/131866/is-it-possible-to-add-new-alphabets-to-unicode-math/169630#169630

hope what I said there is true:-)

I feel I should say as I seem to be doing more than my fair share of
the "complaints"
that unicode-math does a pretty impressive job of taming otf math fonts
and bringing them in to TeX
it's only the surface syntax that we are arguing about really:-)

David

Ulrike Fischer

2014-05-21 11:24:05 UTC

Permalink

Post by Ulrike Fischer
So in my opinion the current \mathbf-etc setup in unicode-math
actually did the right thing and improved the standard
\math-commands.

Frank Mittelbach

2014-05-21 09:31:59 UTC

Permalink

In my opinion the Unicode consortium has not screwed up (backspace
backspace backspace ...) has not found the best possible for math and
there is no way to *properly* reconcile the two worlds.

Unicode started out as an attempt to codify plain text letters of all
languages. One of the most important axioms in that respect was the idea
that a "letter" is an abstract entity, e.g., Latin-small-a and that
different glyphs in fonts all represent that single entity "a"
regardless of shape or form it takes. So attributes like bold or
serif/sans etc are all outside the scope of Unicode encoding.

That makes sense if you try to convey textual meaning. This makes sense
as "word" has a meaning regardless of being in italics or bold or both.
(of course such attributes extend the semantics, e.g. bold may indicate
a heading or italic some emphasis but underlying that "word" still has a
meaning of its own (in a language).

The problem with math though is that symbols in math are traditionally
be not just defined by an abstracted shape, but the mathematical
community early one used additional attributes of glyphs to convey
semantics. So bold-lowercase-latin-letters may denote vectors and in one
formula a integral symbol and a bold-integral may have totally different
semantics. On top of it the semantics may change from field to field or
even from paper to paper (so other than calling it a bold-integral there
is not way to describe such symbols semantically).

The problem with this is that mathematicians have come up with using
effectively any kind of symbol/letter to denote specific semantics and
long ago started to use all kind of attributes (that unicode on the
level plain text regards as irrelevant) to indicate semantics too. The
main point here then is that the moment that happens the attributes
become frozen and symbols+attribute become relevant symbols in their own
right.

As a result to express the language of mathematics unicode would have
needed to codify all kind of letter/symbol+attribute(s) as individual
unicode points which is a difficult if not impossible task.

Nevertheless, they went for this approach to some extend by codifying
mathematical alphabets (mainly digits+a-z+A-Z plus some greek) and of
course a large number of symbols.

In the unicode book it says:

The alphabets in this block encode only semantic distinction, but not
which font will be used to supply the actual plain, script, Fraktur
[...] Characters from the Mathematical Alphanumeric Symbol block are not
to be used for nonmathematical styled text.

All mathematical alphanumeric symbols have compatibility decompositions
to the base Latin and Greek letters. This does not imply that the use of
these characters (I guess the base ones - Frank) is discouraged for
mathematical use. Folding away such distinctions [..] is usually not
desirable, however, as it loses the semantic distinction for which these
characters are encoded.

That is all true and sensible and to explicitly encode that something is
a math-caligraphic S and not just a Latin-S (that happens to be in some
caligraphic font) is desirable when passing data from one application to
the next as the font information is likely to be lost and thus the
semantics.

However, it is by no means offering a full codification of mathematical
semantics, so by the end of the day you may end up with a mixture of
"properly" encoded material + stuff that lost the semantic distinction.

the good part is that it covers a lot but it is not comprehensive by any
means and can't be due to the approach chosen.

It reminds me a bit of a talk I heard recently where somebody was
advocating to use sub-superscript unicode digits to avoid having to type
_2 or ^3 arguing that this is easier and nicer and better readable. Well
to me it isn't the moment you get to real math because then it gets
inconsistent and you end up with mixed syntax.

For the same reason believe that it would have been better to approach
math alphabets differently in unicode and instead of codifying a few
(with limited letter sets) acknowledge the fact that this "language" has
a meta level where symbol+attribute encode semantics and not just symbol
as such.

Anyway this is no here nor there as this is what unicode offers nowadays.

So where does it fail?

- in case of attributed mathematical symbols, most prominently using
bold as offered by the bm package, resulting in new symbols as far as
the semantics are concerned

- in case of multi-letter symbols (that require a fixed font (ie
frozen attributes) but with kerning for aesthetic reason)

- in case of using alphabets which have not been considered (like two
distinctive calligraphic alphabets in parallel, or old german \neq
Fraktur (as my Algebra prof did) or cyrillic or ...

- in the fact of not supporting diacritics for those alphabets (minor
case though)

LaTeX2e's math support codified most of the needs of the mathematics
language albeit only with its domain (that is within the LaTeX syntax),
i.e., it wasn't supporting any unicode code points for math (as they
didn't exist). So something like \mathbf was defining individual bold
math letters (for which unicode now has its own code point as long as
they are basic latin) but it was also offering this for word-like
symbols such as \mathbf{Set}

So if one now maps that to a full fledged text font that supports
kerning, you lose the code point semantic distinction outside LaTeX and
if you map it to the unicode plane then you have to manually deal with
kerning for multi-letter sequence (which is on-trivial and can't be
perfect) or live with horrible spacing.

Or you need to change the interface in LaTeX and offer different
commands or you change internals and distinguish between single letter
and multi-letter arguments. Or ...

frank