[tex-k] Minor problems in kpathsea's fontmap.c

Doug McKenna doug at mathemaesthetics.com
Mon Nov 11 21:58:28 CET 2019


I've been single-stepping through kpathsea doing its thing in my XCode debugger.  There are some issues in fontmap.c that can be fixed/improved.

At line 41, the comment concerning its accompanying token() subroutine states:

   /* Return next whitespace-delimited token in STR or NULL if none.  */

This is incorrect and later code clearly depends on what it says, not on what the code in token() actually does.  token() always delivers an allocated string.  Indeed, it calls xmalloc() which never returns if memory fails.  Thus, if there is no token in its argument string, which is always true when the string is the empty string, token() delivers an allocated empty string.

But the code in the next routine, map_file_parse(), tests for token() returning NULL as a signal of no token found in the string.  So of course none of those tests ever fails.

When reading the font map file, map_file_parse() converts any line starting with a comment character (and not containing a second comment character) to an empty (or white-space-only) line.  When token() is called on this empty line, it returns a newly allocated empty string, and it does so twice, one for the filename variable and one for the alias variable.  All of which is unnecessary.

Worse, the test for a % comment on the line is done with strrchr(), which delivers the last instance of a %, not the first.  This means
that line 17 (or thereabouts) in the texfonts.map file:

% Comments (obviously) start with %.

is not only false (and not obviously), but also entered into the font map aliases hash table as filename=% and alias=Comments, which is obviously wrong, though in practice inconsequential.  I would argue that font file names in the TDS starting with a % should always be illegal, so that the first % on a line, rather than the last, in this mapping file represents an expected comment initiator.  But perhaps there are such files in the TDS that might preclude for compatibility reasons adopting that rule.

But it gets worser for comment lines converted to blank lines.  Both empty strings are then recorded into the fontmap hash table, which always maps the empty key to bucket index 0, and then does a linear scan down that bucket's linked list, so as to append a newly allocated hash table element to take ownership of the two meaningless empty strings.

The texfonts.map file on my system (same as the one here: <http://tug.ctan.org/info/fontname/texfonts.map>) that this routine is parsing contains about 80 blank or commented lines.  Every one of them but one is being converted to a pair of allocated empty strings and appended to the linked list at bucket[0] in the hash table.  So that's an O(80^2) (well, half of that) scan operation during startup, all totally unnecessary.

An improved version of map_file_parse() in fontmap.c might be:

static void
map_file_parse (kpathsea kpse, const_string map_filename)
{
    char *orig_l;
    unsigned map_lineno = 0;
    FILE *f = xfopen (map_filename, FOPEN_R_MODE);
    /* xfopen() always returns non-NULL f */

    if (kpse->record_input)
        kpse->record_input (map_filename);

    while ((orig_l = read_line (f)) != NULL) {
        map_lineno++;

        /* Skip leading whitespace so we can use strlen() below.  Can't use
           strtok() because this routine is recursive.  */
        string l = orig_l;                   /* save for free() */
        while (*l && ISSPACE (*l)) l++;

        /* Ignore line entirely if empty or first non-blank character is %. */
        if (*l && *l!='%') {

            /* Something substantive there: truncate at first % or @c  */
            string comment_loc = strchr (l, '%');
            if (!comment_loc)
                comment_loc = strstr (l, "@c");
            if (comment_loc)
                *comment_loc = 0;

            /* There's something uncommented on the line:
               get filename and then alias following it.  */
            string filename = token (l);
            string alias = token (l + strlen (filename));
            /* token() always delivers allocated string.
               An empty alias string is an error, though.  */

            boolean isInclude = STREQ (filename, "include");

            if (*alias == 0) {
                if (isInclude)
                    WARNING2 ("kpathsea: %s:%u: Filename argument for include directive missing",
                                    map_filename, map_lineno);
                else
                    WARNING3 ("kpathsea: %s:%u: Fontname alias missing for filename `%s'",
                                    map_filename, map_lineno, filename);
                free (alias);
                free (filename);

            } else {
                /* Successful parse of first two non-empty tokens */
                if (isInclude) {
                    string include_fname = kpathsea_path_search (kpse,
                                                kpse->map_path, alias, false);
                    if (include_fname) {
                        map_file_parse (kpse, include_fname);
                        if (include_fname != alias)
                            free (include_fname);
                    } else {
                        WARNING3 ("kpathsea: %s:%u: Can't find fontname include file `%s'",
                                    map_filename, map_lineno, alias);
                    }
                    free (alias);
                    free (filename);

                } else {
                    /* We've got everything.  Insert the new entry.  They were
                       already dynamically allocated by token(), so don't bother
                       with xstrdup().  */
                    hash_insert_normalized (&(kpse->map), alias, filename);
                }
            }
        }
        free (orig_l);
    }

    xfclose (f, map_filename);
}

The only substantive change in the above code's operation would be how it would parse a line like

    %filename aliasname  % trailing comment

due to using strchr() rather than strrchr().  The current kpathsea code would enter the (key:value) pair (%filename:aliasname) into the hash table.  The above code would ignore the line, which is (I would argue) the expected behavior (and makes the internal comment in texfonts.map correct).


FWIW,

Doug McKenna


More information about the tex-k mailing list