OpenBSD Journal

strlcpy(3) Use in 3rd Party Software

Contributed by tbert on from the decoded-symbols dept.

Theo de Raadt (deraadt@) penned a missive titled "On the matter of strlcpy/strlcat acceptance by industry":

From time to time, there are people who say that strlcpy and strlcat
are stupid.

This is a little frustrating because we just want developers to have
an easier time writing/auditing string code to avoid overflows and
truncations, especially considering so many standard C APIs require
fixed length strings or have other limits, and will in the forceable
future.

You probably all know about the mainstream users of these functions,
like the Linux kernel, or MacOS, or the other BSD's, and Solaris.  But
there are many, many more, and it is time to show the global
strlcpy'ing deniers the reality.

I've collected some statistics to see how much upstream software use
these functions.

The (elided) rest of the message below the fold; the full lists of software can be found at the link to the mailing list archive.

I asked Stuart Henderson to collect a "recursive nm .o" for every
piece of software built in our ports tree.  It's roughly 2GB of
text output.

For those who don't know, that ports tree is basically a repository of
all the application software we supply as an add-on on top of the base
operating system.  Each of those becomes a package, so that is what we
are looking at.  They are pretty much the bulk of the commonly-used
Unix applications found on all systems.

These packages do not generally include things like openssh, perl, or
X11, sqlite, or a number of other small things directly integrated
into the OpenBSD base.  But that's OK, because those I just mentioned
do use strlcpy and strlcat in their upstream repositories.

So 3535 packages contain .o files, and now we can grep to see what
they define or use.

In essence, a piece of software will likely fall into one of these
catagories:

    (0) Not use the functions at all.
    (1) Will assume that the system has the functions in libc.
    (2) Will have a configure-style "feature-test" which tests if libc
        contains the functions, and thus turn on a cpp symbol such as
	HAS_STRLCPY, then use the libc version.  Otherwise it will
        avoid using them...
    (3) More commonly, if the feature-test fails, it will substitute
        copies from its own tree.  Essentially to cope with glibc.
    (4) Some software contain their own version, typically copied
        from us, but renamed.  There are many of these.

Let's look at these cases backwards, for reasons that become obvious
as we move ahead.

(4) Who is defining their own versions of the functions, with slightly
    different names?  The obvious names we find are:

	SDL_strlcpy		SDL_utf8strlcpy		_iodbcdm_strlcpy
	_strlcpy		ascii_safe_strlcpy	av_strlcpy
	cli_strlcpy		dt_utf8_strlcpy		fc_strlcpy
	fl_strlcpy		flac__strlcpy		fz_strlcpy
	g_strlcpy		hd_strlcpy		isc_string_strlcpy
	lg_strlcpy		llvm_strlcpy		loud_strlcpy
	mcs_strlcpy		mg_strlcpy		monoeg_g_strlcpy
	mowgli_strlcpy		my_strlcpy		mystrlcpy
	os_strlcpy		pa_strlcpy		rb_strlcpy
	sg_strlcpy		sl_strlcpy		sm_strlcpy
	test_evutil_strlcpy	test_strlcpy		tr_strlcpy
	ut_strlcpy		utf8_strlcpy		uv_strlcpy
	vi_strlcpy		xstrlcpy		zbx_strlcpy

	SDL_strlcat		SDL_strlcpy		_iodbcdm_strlcat
	av_strlcat		fc_strlcat		fl_strlcat
	flac__strlcat		fz_strlcat		g_strlcat
	hd_strlcat		isc_string_strlcat	ixp_strlcat
	mcs_strlcat		mowgli_strlcat		mystrlcat
	rb_strlcat		sg_strlcat		sl_strlcat
	sm_strlcat		ssh_strlcat		uv_strlcat
	vi_strlcat		wmii_strlcat		xstrlcat
	zbx_strlcat

    Replacement copies seem to be quite popular.  Some of the names
    hint at who is doing this, but we can search by these functions to
    see which packages are defining them:

	bogofilter bro clamav cntlm cups-filters darktable dkim-milter
	ffmpeg flac fltk freeciv fte glib2 gtk-gnutella htmldoc iodbc
	ircd-ratbox isc-bind isc-dhcp ksh93 leafnode libixp libstatgrab
	link-grammar linkchecker llvm mathomatic mcs mono mowgli mupdf
	mysql node pmacct postgresql pulseaudio rlwrap samhain sdl2
	tcpreplay transmission visitors wmii wpa_supplicant xfe xpilot
	zabbix

    So 73 (2% or 3535) of packages define either of these for themselves
    under a new name.  This may seem like a small list, but look it
    contains monsters like glib2, postgresql, and mysql.  In particular,
    those monster contain libraries..  this will become more obvious a
    bit further on.

(3) What about software which substitutes their own, when they don't
    find ours?  This is harder to determine in the OpenBSD ports tree
    because our libc functions will always be found.  However, we can
    see if any ports sloppily compile their own versions, even though
    we have it...

	databases/pgpool: T strlcpy
	devel/p5-File-RsyncP: T strlcpy
	devel/py-setproctitle: T strlcpy
	editors/fte: T strlcpy
	games/oolite: T strlcpy
	games/stone-soup: T strlcpy
	games/xpilot: T strlcpy
	mail/akpop3d: T strlcpy
	net/bro: T strlcpy
	net/tcpreplay: T strlcpy
	shells/ksh93: T strlcpy
	www/cntlm: T strlcpy
	www/linkchecker: T strlcpy
	x11/xfe: T strlcpy

	editors/fte: T strlcat
	games/xpilot: T strlcat
	net/bro: T strlcat
	net/pmacct: T strlcat
	net/tcpreplay: T strlcat
	shells/ksh93: T strlcat
	www/cntlm: T strlcat
	www/linkchecker: T strlcat
	x11/xfe: T strlcat

    This was rather unexpected.   These software teams have decided to
    simply use the same name, for (hopefully) the same functionality.

(2) Regarding the question of code which uses a feature test to find if
    the functions exist, and having not found them, then avoids them?
    We cannot test using the "symbol table" method.  A test would need
    to be run on a system without the functions in libc.  That test
    cannot be run on a BSD, MacOS, or Solaris...

(1) The question of which ports use the functions in libc should really
    be split into two questions.   How many use our functions
    (strlcpy and strlcat)?  How many use the renamed functions
    (for instance, g_strlcpy from glib, isc_string_strlcpy, etc).

    The following 254 (7% of 3535) of packages use our strlcpy:

        [list of software elided]

    The following 158 (4% of 3535) of packages use our strlcat:

        [list of software elided]

    The following 326 (9% of 3535) packages use another library's
    private *strlcpy function:

        [list of software elided]

    The following 35 (1% of 3535) packages use another library's private
    *strlcat function:

	bitlbee chromium darktable dkim-milter eboard ffmpeg flac freeciv
	gcompris gecko-mediaplayer gmtk gnome-mplayer gtk-gnutella gtkpod
	htmldoc inkscape iodbc ircd-ratbox jnettop libstatgrab mcs mplayer
	mupdf ncmpc osmo pidgin qemu rlwrap samhain scmpc ufraw uim wmii xmms2
	zabbix

(0) Finally, we should answer the question about who is not using these
    functions or variants.  Let us keep the answer really simple.

    The following 1808 (51% of 3535) packages use strcpy:

        [list of software elided]

    I'm not going to bother including the data for strcat.

    So 50% of software still calls strcpy.  There is no way they have
    all been audited to avoid overflow.

Following this, a few more observations are in order:

(1) Remarkably, four pieces off software still use gets(3)

	chipmunk Wnn alpine metamail

(2) sprintf is still pretty popular.  1810 (51% of 3535) packages use it.

        [list of software elided]
	
    Quite worrying.  The odds of overflow or truncation are very high.

(2) The above sprintf numbers are quite worrying.  On the bright side,
    snprintf utilization is probably better than a few years ago.
    1810 (38% of 3535) of packages use it.

        [list of software elided]
	
Finally, I would like to take this opportunity to remind everyone of
this piece from the strlcpy(3) manual page found at

    http://www.openbsd.org/cgi-bin/man.cgi?query=strlcpy

[...]
RETURN VALUES
     Besides quibbles over the return type (size_t versus int) and signal
     handler safety (snprintf(3) is not entirely safe on some systems), the
     following two are equivalent:

           n = strlcpy(dst, src, len);
           n = snprintf(dst, len, "%s", src);

     Like snprintf(3), the strlcpy() and strlcat() functions return the total
     length of the string they tried to create.  For strlcpy() that means the
     length of src.  For strlcat() that means the initial length of dst plus
     the length of src.
[...]

snprintf, strlcpy, and strlcat are used in exactly the same way.

Using .o file symbols like above does not prove to us whether people
are using the APIs in the most careful way -- that would require a
source code inspection.  But to provide an example, bind9 contains 114
uses of snprintf which don't check the return value to spot
truncation, with code like the following

                        char buf[DNS_NAME_FORMATSIZE + sizeof(": TSIG ''")];
			[...]
                                char namebuf[DNS_NAME_FORMATSIZE];
                                dns_name_format(&zone->tsigkey->name, namebuf,
                                                sizeof(namebuf));
                                snprintf(buf, sizeof(buf), ": TSIG '%s'",
                                         namebuf);

Fine, maybe it is safe, of the "it has been audited, and next time
someone is here, they will audit it again".  I also don't have time to
verify this or the 113 other cases, nor is it my job.

I bring this up to ask why strlcpy/strlcat are being held to some
arbitrary standard that they should handle truncation better .. when
it is the case that it is handling it JUST LIKE the commonplace
snprintf API.  Right here in mainstream code, we see that snprintf's
return is not being handled, against best practice taught everywhere.
Should snprintf call abort?  That's ridiculous.  Should it crash?
What should it do?  The fact that no other function of that sort has
ever made it into the mainstream perhaps shows the arguments are weak.
If something is better, take some real software and fix it.

To upstream authors of software who are using the functions: please
continue incorporating more of them into your software, because it is
good for the users of your software.  Please check the return values
to spot truncation as described the manual page, and properly handle
that condition in the best way you can based on the location of the
call.  Thanks!

(Comments are closed)


Comments
  1. By sneaker (sneaker) sneaker@noahpugsley.net on

    Bloody hell. There might be a big lock still but posts like this show that long term, this project is king of the world. It's not about the features, it's about the process.

Latest Articles

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]