OpenBSD Journal

Developer blog: cloder

Contributed by cloder on from the toolchain improvements dept.

After months of slacking since C2K5, I have been hacking a little bit. I have started to focus on lint, the C static analysis tool. Lint has a reputation for being very spammy, but we are going to improve it so that it can help us to find and fix serious bugs, some of which will be exploitable.

At CanSecWest 2005 and following at the 2005 OpenBSD hackathon, I had some conversations with Theo about the kinds of bugs that don't get enough attention. Theo mentioned how integer conversion bugs (signed/unsigned, pointer/integer, etc.) are not given enough attention by programmers. People think these bugs are merely portability issues, but actually they are dangerous and potentially exploitable. Over the last couple years, we have seen how integer overflows and signed/unsigned conversions are at least as dangerous as straightforward buffer overflows (the Sun RPC/XDR integer overflow bugs from 2002 and the OpenSSL integer overflow bug from 2003 are just two examples). We are going to be seeing more of these.

I agreed with Theo that while lint has a well-deserved reputation for producing tons of spam, it could be very useful for finding certain classes of bugs that haven't been getting enough attention. In particular, lint can be very good at pointing out type conversion mistakes that sometimes lead to exploitable bugs. Lint was initially written to check for non-portable things, which makes it ideal for the kinds of bugs we want to find. More on that later.

While many people are aware of OpenBSD advances in the areas of run-time safety and resistance to attack (ProPolice, stackgap, W^X, randomized mmap, randomized malloc, privsep, etc.), I think fewer are aware of the improvements OpenBSD has made on the toolchain side. These improvements include compiler features (static bounds checking in gcc with -Wbounded) and the replacement of frequently-misused, confusing API's (replacing strn* with strl*, replacing atoi/atol with strtonum, etc.)

My recent hacking on lint is a modest effort along the same lines. I want to get lint to the point where people feel like it's worth using. My first goal is to get lint to parse our tree on all platforms without syntax errors (which involves understanding C99 syntax and gcc-isms). This isn't that hard, because lint uses gcc's preprocessor before parsing, which means we can hide some of the more disgusting things from lint.

My second goal is to get lint to shut up about things that don't help us find bugs. This is a bit harder, but it's crucial. Lint's verbosity is the reason why virtually nobody uses lint. Things are improving on this front.

Once lint's signal:noise ratio reaches an acceptable level, we will hopefully see more people using it, and people will start realizing that lint is very good at certain things.

As I mentioned above, lint is great at finding type conversion bugs (e.g. signed/unsigned, pointer/integral, and pointer/pointer). The reason lint is very good at pointing these bugs out is because lint is very stupid. Lint does all of its checking during parsing, during the construction of the AST. It does not need much context to perform its checks -- it just needs to know the types of things. There is no attempt at value tracking, data flow analysis, or anything like that. We leave that stuff to other tools like splint (and this makes splint totally useless for non-annotated code, IMHO).

Lint simply encodes the rules for C type conversion. The ISO C type conversion rules are surprising (and dismaying) to even experienced C programmers. Unsigned operands often get "promoted" to signed types, regardless of types of the other operands. Sign extension happens in non-obvious places.

Some other notes, briefly:

  • The goal is not to make our code lint-free. Not only is it impossible, it's also a waste of time.
  • Sprinkling casts throughout the code usually makes things worse. Other open source projects (which I will not name) have gone this route. All you end up doing is hiding bad code from lint.
  • Many people think that "gcc -Wall -Wsign-compare" will tell them about these things. Nothing could be further from the truth. Run gcc -Wall on some of the lint regression tests in regress/usr.bin/xlint and you might be surprised at how much bad code gcc will silently eat.
  • While lint is getting better in userland, it still has a hard time in certain libs and an even harder time in the kernel. krw has started to use lint a little bit in the kernel and has been fixing some real issues in some drivers, but he's having to wade through hundreds of pages of warnings.
Hopefully over the next few weeks we can get lint to shut up and grok all our code, and then we will start making it more noisy about important things, such as the misuse of API's:
   size_t n;
   n = snprintf(...);
or:
   n = read(...);
   write(..., n);
Again, these are modest improvements. It's nothing revolutionary...the approach is to find and fix lots of instances of the same bugs over and over again. We fix things and we move on to the next thing.

(Comments are closed)


Comments
  1. By Anonymous Coward (195.224.109.30) on

    http://www.theepochtimes.com/news/5-7-5/30084.html

  2. By Anonymous Coward (84.9.40.34) on

    cheers for the explanation of the some of the commits i have seen going in recently... always good to know whats going on from the developers side ;)

  3. By Anonymous Coward (68.124.58.147) on

    Why wasn't the "Developer blog" properly introduced?

    Comments
    1. By Marco Peereboom (67.64.89.177) slash@peereboom.us on http://www.peereboom.us

      What's wrong with this introduction?

      Comments
      1. By He who was not certain of the first (68.124.58.147) on

        From WordNet (r) 2.0:

        introduction
        n 1: the first section of a communication

        This entry does not explicitly tell me it is the first; the reason for being "wrong". So, nothing has been introduced.

        It is an uncomfortable feeling to think that there may be a first blog entry that I may be missing out on, all because the first was not pointed out.

        Thankfully, there's a link on every entry that takes me to all of them; the link being implicit, and not explicit, being the difference.

  4. By Adam S (68.211.140.159) on http://www.emergentchaos.com

    Is there a way to subscribe to posts about this? I'd like to hear more about progress?

    Comments
    1. By Marco Peereboom (67.64.89.177) marco@peereboom.us on http://www.peereboom.us

      Sure, it's called cvs. Chad has been very active so just read his cvs commit messages and you should get a pretty good idea what he has been up to. Lint on openbsd is pretty darn good thanks to the hard work of many folks.

  5. By jumping in (24.217.190.176) on

    I'm a long-time Windows guy getting my feet wet with shell scripting and Perl, and the first thing I picked up was to use the interpreter's built-in syntax checking (perl -c, and ksh -n); check the script syntax first before trying to run it for real. So I figured a good start for writing a script would be to run quick, sanity-check validation checks. I threw in 'validate' and 'flawfinder' from ports/packages (just to do it). And then I installed a bunch of packages and ran the script against each package to pick up simple errors.

    For instance, in samba-docs-3.0.13p1:

    Perl: /usr/local/share/examples/samba/genlogon/genlogon.pl
    syntax error at /usr/local/share/examples/samba/genlogon/genlogon.pl line 49, near ")
    {"
    syntax error at /usr/local/share/examples/samba/genlogon/genlogon.pl line 52, near "}"
    /usr/local/share/examples/samba/genlogon/genlogon.pl had compilation errors.

    The "If" on line 48 shouldn't be capitalized.
    If ($ARGV[1] eq "ADMIN" || $ARGV[0] eq "admin")

    And there are a lot of false positives on example files, since dependencies are necessarily installed (e.g., LDAP.pm not in @INC).

    The Title on 'man Net::DNS::Nameserver' from p5-Net-DNS-0.53 I'd have just chalked up to a display issue:

    Net::DNS::NamesUserrContributed Perl DocumNet::DNS::Nameserver(3)

    troff: /usr/local/man/man3p/Net::DNS::Nameserver.3p
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:32: warning: `Tr' not defined
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:55: warning: number register `F' not defined
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:131: warning: `IX' not defined
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:132: warning: `TH' not defined
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:133: warning: `SH' not defined
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:137: warning: `C`' not defined
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:137: warning: `C'' not defined
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:9: backtrace: macro `Sh'
    /usr/local/man/man3p/Net::DNS::Nameserver.3p:144: warning: `PP' not defined

    Reading up on troff/nroff/groff/man, et al., was interesting. It took a while for me to figure out what all those ^H backspaces in man pages were for. And there are plenty of syntax errors in man pages, but most are just generated from elsewhere, apparently, and it wouldn't do any good to fix them.

    Almost no one writes valid HTML ("FakeTML"), but I ran 'validate' anyway.

    I'd post the code, if I could.... Submission form is choking.

    I just run file(1) for certain types and then match file extensions for ones that are missed, like *.pm perl modules.

    Perl scripts and modules:
    perl -c <file>

    Man page and relatives:
    groff -b -z -w w <file>

    Shell scripts:
    ksh -n <file>

    HTML:
    validate -w --verbose <file>

    C/C++ source:
    flawfinder <file>

    PHP scripts:
    php -l <file>



Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]