Contributed by tbert on from the decoded-symbols dept.
Theo de Raadt (deraadt@) penned a missive titled "On the matter of strlcpy/strlcat acceptance by industry":
From time to time, there are people who say that strlcpy and strlcat are stupid. This is a little frustrating because we just want developers to have an easier time writing/auditing string code to avoid overflows and truncations, especially considering so many standard C APIs require fixed length strings or have other limits, and will in the forceable future. You probably all know about the mainstream users of these functions, like the Linux kernel, or MacOS, or the other BSD's, and Solaris. But there are many, many more, and it is time to show the global strlcpy'ing deniers the reality. I've collected some statistics to see how much upstream software use these functions.
The (elided) rest of the message below the fold; the full lists of software can be found at the link to the mailing list archive.
I asked Stuart Henderson to collect a "recursive nm .o" for every
piece of software built in our ports tree. It's roughly 2GB of
text output.
For those who don't know, that ports tree is basically a repository of
all the application software we supply as an add-on on top of the base
operating system. Each of those becomes a package, so that is what we
are looking at. They are pretty much the bulk of the commonly-used
Unix applications found on all systems.
These packages do not generally include things like openssh, perl, or
X11, sqlite, or a number of other small things directly integrated
into the OpenBSD base. But that's OK, because those I just mentioned
do use strlcpy and strlcat in their upstream repositories.
So 3535 packages contain .o files, and now we can grep to see what
they define or use.
In essence, a piece of software will likely fall into one of these
catagories:
(0) Not use the functions at all.
(1) Will assume that the system has the functions in libc.
(2) Will have a configure-style "feature-test" which tests if libc
contains the functions, and thus turn on a cpp symbol such as
HAS_STRLCPY, then use the libc version. Otherwise it will
avoid using them...
(3) More commonly, if the feature-test fails, it will substitute
copies from its own tree. Essentially to cope with glibc.
(4) Some software contain their own version, typically copied
from us, but renamed. There are many of these.
Let's look at these cases backwards, for reasons that become obvious
as we move ahead.
(4) Who is defining their own versions of the functions, with slightly
different names? The obvious names we find are:
SDL_strlcpy SDL_utf8strlcpy _iodbcdm_strlcpy
_strlcpy ascii_safe_strlcpy av_strlcpy
cli_strlcpy dt_utf8_strlcpy fc_strlcpy
fl_strlcpy flac__strlcpy fz_strlcpy
g_strlcpy hd_strlcpy isc_string_strlcpy
lg_strlcpy llvm_strlcpy loud_strlcpy
mcs_strlcpy mg_strlcpy monoeg_g_strlcpy
mowgli_strlcpy my_strlcpy mystrlcpy
os_strlcpy pa_strlcpy rb_strlcpy
sg_strlcpy sl_strlcpy sm_strlcpy
test_evutil_strlcpy test_strlcpy tr_strlcpy
ut_strlcpy utf8_strlcpy uv_strlcpy
vi_strlcpy xstrlcpy zbx_strlcpy
SDL_strlcat SDL_strlcpy _iodbcdm_strlcat
av_strlcat fc_strlcat fl_strlcat
flac__strlcat fz_strlcat g_strlcat
hd_strlcat isc_string_strlcat ixp_strlcat
mcs_strlcat mowgli_strlcat mystrlcat
rb_strlcat sg_strlcat sl_strlcat
sm_strlcat ssh_strlcat uv_strlcat
vi_strlcat wmii_strlcat xstrlcat
zbx_strlcat
Replacement copies seem to be quite popular. Some of the names
hint at who is doing this, but we can search by these functions to
see which packages are defining them:
bogofilter bro clamav cntlm cups-filters darktable dkim-milter
ffmpeg flac fltk freeciv fte glib2 gtk-gnutella htmldoc iodbc
ircd-ratbox isc-bind isc-dhcp ksh93 leafnode libixp libstatgrab
link-grammar linkchecker llvm mathomatic mcs mono mowgli mupdf
mysql node pmacct postgresql pulseaudio rlwrap samhain sdl2
tcpreplay transmission visitors wmii wpa_supplicant xfe xpilot
zabbix
So 73 (2% or 3535) of packages define either of these for themselves
under a new name. This may seem like a small list, but look it
contains monsters like glib2, postgresql, and mysql. In particular,
those monster contain libraries.. this will become more obvious a
bit further on.
(3) What about software which substitutes their own, when they don't
find ours? This is harder to determine in the OpenBSD ports tree
because our libc functions will always be found. However, we can
see if any ports sloppily compile their own versions, even though
we have it...
databases/pgpool: T strlcpy
devel/p5-File-RsyncP: T strlcpy
devel/py-setproctitle: T strlcpy
editors/fte: T strlcpy
games/oolite: T strlcpy
games/stone-soup: T strlcpy
games/xpilot: T strlcpy
mail/akpop3d: T strlcpy
net/bro: T strlcpy
net/tcpreplay: T strlcpy
shells/ksh93: T strlcpy
www/cntlm: T strlcpy
www/linkchecker: T strlcpy
x11/xfe: T strlcpy
editors/fte: T strlcat
games/xpilot: T strlcat
net/bro: T strlcat
net/pmacct: T strlcat
net/tcpreplay: T strlcat
shells/ksh93: T strlcat
www/cntlm: T strlcat
www/linkchecker: T strlcat
x11/xfe: T strlcat
This was rather unexpected. These software teams have decided to
simply use the same name, for (hopefully) the same functionality.
(2) Regarding the question of code which uses a feature test to find if
the functions exist, and having not found them, then avoids them?
We cannot test using the "symbol table" method. A test would need
to be run on a system without the functions in libc. That test
cannot be run on a BSD, MacOS, or Solaris...
(1) The question of which ports use the functions in libc should really
be split into two questions. How many use our functions
(strlcpy and strlcat)? How many use the renamed functions
(for instance, g_strlcpy from glib, isc_string_strlcpy, etc).
The following 254 (7% of 3535) of packages use our strlcpy:
[list of software elided]
The following 158 (4% of 3535) of packages use our strlcat:
[list of software elided]
The following 326 (9% of 3535) packages use another library's
private *strlcpy function:
[list of software elided]
The following 35 (1% of 3535) packages use another library's private
*strlcat function:
bitlbee chromium darktable dkim-milter eboard ffmpeg flac freeciv
gcompris gecko-mediaplayer gmtk gnome-mplayer gtk-gnutella gtkpod
htmldoc inkscape iodbc ircd-ratbox jnettop libstatgrab mcs mplayer
mupdf ncmpc osmo pidgin qemu rlwrap samhain scmpc ufraw uim wmii xmms2
zabbix
(0) Finally, we should answer the question about who is not using these
functions or variants. Let us keep the answer really simple.
The following 1808 (51% of 3535) packages use strcpy:
[list of software elided]
I'm not going to bother including the data for strcat.
So 50% of software still calls strcpy. There is no way they have
all been audited to avoid overflow.
Following this, a few more observations are in order:
(1) Remarkably, four pieces off software still use gets(3)
chipmunk Wnn alpine metamail
(2) sprintf is still pretty popular. 1810 (51% of 3535) packages use it.
[list of software elided]
Quite worrying. The odds of overflow or truncation are very high.
(2) The above sprintf numbers are quite worrying. On the bright side,
snprintf utilization is probably better than a few years ago.
1810 (38% of 3535) of packages use it.
[list of software elided]
Finally, I would like to take this opportunity to remind everyone of
this piece from the strlcpy(3) manual page found at
http://www.openbsd.org/cgi-bin/man.cgi?query=strlcpy
[...]
RETURN VALUES
Besides quibbles over the return type (size_t versus int) and signal
handler safety (snprintf(3) is not entirely safe on some systems), the
following two are equivalent:
n = strlcpy(dst, src, len);
n = snprintf(dst, len, "%s", src);
Like snprintf(3), the strlcpy() and strlcat() functions return the total
length of the string they tried to create. For strlcpy() that means the
length of src. For strlcat() that means the initial length of dst plus
the length of src.
[...]
snprintf, strlcpy, and strlcat are used in exactly the same way.
Using .o file symbols like above does not prove to us whether people
are using the APIs in the most careful way -- that would require a
source code inspection. But to provide an example, bind9 contains 114
uses of snprintf which don't check the return value to spot
truncation, with code like the following
char buf[DNS_NAME_FORMATSIZE + sizeof(": TSIG ''")];
[...]
char namebuf[DNS_NAME_FORMATSIZE];
dns_name_format(&zone->tsigkey->name, namebuf,
sizeof(namebuf));
snprintf(buf, sizeof(buf), ": TSIG '%s'",
namebuf);
Fine, maybe it is safe, of the "it has been audited, and next time
someone is here, they will audit it again". I also don't have time to
verify this or the 113 other cases, nor is it my job.
I bring this up to ask why strlcpy/strlcat are being held to some
arbitrary standard that they should handle truncation better .. when
it is the case that it is handling it JUST LIKE the commonplace
snprintf API. Right here in mainstream code, we see that snprintf's
return is not being handled, against best practice taught everywhere.
Should snprintf call abort? That's ridiculous. Should it crash?
What should it do? The fact that no other function of that sort has
ever made it into the mainstream perhaps shows the arguments are weak.
If something is better, take some real software and fix it.
To upstream authors of software who are using the functions: please
continue incorporating more of them into your software, because it is
good for the users of your software. Please check the return values
to spot truncation as described the manual page, and properly handle
that condition in the best way you can based on the location of the
call. Thanks!
(Comments are closed)

By sneaker (sneaker) sneaker@noahpugsley.net on