Bug Report: pdfLaTex does not accept utf-8 checkmark character as command but claims Invalid UTF-8 byte "9C"

David Carlisle d.p.carlisle at gmail.com
Sat Jan 20 09:10:03 CET 2024


this is user error not a bug. latex will decode utf8 while typesetting but
pdftex is fundamentally 8 bit.

Command names have to be letters or single tokens, so you have three
tokens, the command with byte e2 then the two bytes 9c 93 so the parts get
split into two invalid utf-8 sequences.  the leading quote is Tex syntax
for hexadecimal.

David

On Sat, 20 Jan 2024, 02:03 Dirk Herrmann via tex-live, <tex-live at tug.org>
wrote:

> Dear maintainers,
>
> I would like to report a (likely) bug with pdfTeX
> 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian).
>
> I have the following command definition:
>
>      \newcommand{\✓}{\checkmark}
>
> This has worked flawlessly with XeLaTeX so far.  As I now realized that
> pdfLaTeX by default uses UTF-8 as input format since 2018, I am giving
> this a try (pdfTeX 3.141592653-2.6-1.40.22 (TeX Live 2022/dev/Debian)).
> For that command, however, I get the following error message:
>
>      ! LaTeX Error: Invalid UTF-8 byte "9C.
>
> (Yes, the quote after "9c is missing in the output).
>
> To be sure that this is not actually a badly encoded file (unlikely as
> XeLaTeX has accepted it and Emacs shows it properly), I created a hexdump:
>
>      00004ab0  77 63 6f 6d 6d 61 6e 64  7b 5c e2 9c 93 7d 7b 5c
> |wcommand{\...}{\|
>
> The checkmark sequence is 'e2 9c 93' as I understand it to be correct,
> according to https://www.compart.com/en/unicode/U+2713.
>
> Is this a bug in pdfLaTeX, or is it not intended to create such commands
> containing UTF-8 characters with pdfLaTeX anyway?
>
> Thanks a lot for all your work on the various TeX components, and kind
> regards,
> Dirk Herrmann
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://tug.org/pipermail/tex-live/attachments/20240120/339d0723/attachment.htm>


More information about the tex-live mailing list.