OpenBSD Journal

Developer Blog: ray@ - Reading Between the Lines

Contributed by Ray Lai on from the line-after-line dept.

Today we'll be talking about caveats when reading lines from a file. cloder@ recently discovered a series of fgets(3) misuses:
	char buf[1024];

	fgets(buf, sizeof(buf), fp);
	/* Nuke newline. */
	buf[strlen(buf) - 1] = '\0';

fgets(3) includes a newline character from each line, but only if one was found. This is why most code truncates the string after calling fgets(3). There are three things wrong with this example:

1. fgets(3) can fail and return NULL.
2. buf[strlen(buf) - 1] might not be a newline, truncating a valid character. This can happen if the file did not end in a newline, if the buffer was too small to store the line, or
3. buf might contain binary data and start with a NUL character, causing strlen(buf) to be zero. This, in turn, causes an out-of-bounds write.

The following example, copied directly from the man page, checks for all three errors:

	char buf[1024];

	if (fgets(buf, sizeof(buf), fp) != NULL) {
		if (buf[0] != '\0' && buf[strlen(buf) - 1] == '\n')
			buf[strlen(buf) - 1] = '\0';
	}

Note that lines containing NUL characters cannot be read reliably with fgets(3), since we are using strlen(3) to check the line length.

Some of these issues can be avoided by using fgetln(3). fgetln(3) allows for arbitrarily long lines and returns the number of characters read. Like fgets(3), newline characters must be removed manually; while this is easier to do with fgetln(3) because the length is returned, a check must still be performed.

Here is an example, copied directly from the man page:

	char *buf, *lbuf;
	size_t len;

	lbuf = NULL;
	while ((buf = fgetln(fp, &len))) {
		if (buf[len - 1] == '\n')
			buf[len - 1] = '\0';
		else {
			/* EOF without EOL, copy and add the NUL */
			if ((lbuf = malloc(len + 1)) == NULL)
				err(1, NULL);
			memcpy(lbuf, buf, len);
			lbuf[len] = '\0';
			buf = lbuf;
		}
		printf("%s\n", buf);
	}
	free(lbuf);

A few things to note here:

1. There is no strlen(3) call, so the only thing affected by NUL characters is printf(3).
2. fgetln(3) cannot return successfully with a zero-length string, so the buffer cannot be accessed with a negative index.
3. The free(3) is correctly placed, since it is only possible to call malloc(3) in the last iteration.

For a higher level of abstraction and more knobs, check out fparseln(3), which takes care of newline characters automatically and can recognize escape characters, comment characters, and continuation characters. Unfortunately, fparseln(3) has more overhead, is less portable, and each line must be freed afterwards. It is also impossible to check if the file ends in a newline; to reliably read every character in a file, use fgetln(3).

For more information, read the man pages: fgets(3), fgetln(3), and fparseln(3). Pay attention to the CAVEATS sections for fgets(3) and fgetln(3).

(Comments are closed)


Comments
  1. By Anonymous Coward (84.9.187.181) on

    ...while this is easier to do with fgets(3)...

    Should no doubt be:

    ...while this is easier to do with fgetln(3)...

    Comments
    1. By Ray Lai (85.25.4.93) ray@ on http://cyth.net/~ray/

      > ...while this is easier to do with fgets(3)...
      >
      > Should no doubt be:
      >
      > ...while this is easier to do with fgetln(3)...

      Yes, all the fget* typing confused me.

  2. By Tony (12.38.8.233) on

    So in the EOF w/o EOL case of fgetln, what happens to the memory allocated for buf originally? If fgetln is doing a malloc (and this is where I could be wrong), and buf is pointed to that, and later after the memcpy buf is re-pointed to lbuf, isn't the original memory referenced by buf now left behind w/o being freed properly? When lbuf is freed at the end, it's not referencing the memory allocated for buf in the fgetln call.

    Or do I just misunderstand what fgetln is doing behind the scenes?

    Comments
    1. By Tony (12.38.8.233) on

      > So in the EOF w/o EOL case of fgetln, what happens to the memory allocated for buf originally? If fgetln is doing a malloc (and this is where I could be wrong), and buf is pointed to that, and later after the memcpy buf is re-pointed to lbuf, isn't the original memory referenced by buf now left behind w/o being freed properly? When lbuf is freed at the end, it's not referencing the memory allocated for buf in the fgetln call.
      >
      > Or do I just misunderstand what fgetln is doing behind the scenes?


      OK, I'm an idiot, ignore me. Just re-read the man page for fgetln. For those wondering, fgetln hands you a memory location from the stream you've passed it. Closing that stream will free that memory, so no need to free buf. In fact, doing so would probably cause bad things to happen.

  3. By tedu (69.12.168.114) on

    fgetln is basically a disaster. it's not portable and not really any better over fgets, which is as portable as it gets. if you want nonportable, use fparseln.

  4. By Anonymous Coward (70.49.119.203) on

    Why does _anyone_ care about this crap anymore? Isn't it well known that the standard C library has many very suboptimal routines? Going around and patching their uses and proclaiming with great joy and self-fulfillment how one should properly use them isn't really making useful progress in my opinion.

    Why not just start using something better? Write better libraries. Even if they're not standard, so long as it's in the base system, who cares? Software with very impressive security histories (i.e. Postfix) don't expend the trouble trying to be extra careful about these shakey foundations -- they build new ones, better ones.

    If OpenBSD really cared about proper usage, why wouldn't they go about introducing abstractions and interfaces that are better. The strl* functions don't count. They were a good, very trivial first step, but there's so much more that could be done. Why not go further?

    Comments
    1. By Anonymous Coward (69.207.171.114) on

      This post makes me think. It would be nice if OpenBSD (or some similarly skilled group) wrote a small library for "the stuff that other libcs missed" that is portable. That way a port to Linux or whatever can benefit from some "sanitized" functions. It wouldn't matter so much that said calls are non-portable if portable versions were readily available for mass consumption in the form of lib{whatever}.so.

    2. By Anonymous Coward (68.104.220.48) on

      > Why not just start using something better? Write better libraries. Even if they're not standard, so long as it's in the base system, who cares? Software with very impressive security histories (i.e. Postfix) don't expend the trouble trying to be extra careful about these shakey foundations -- they build new ones, better ones.
      >
      > If OpenBSD really cared about proper usage, why wouldn't they go about introducing abstractions and interfaces that are better. The strl* functions don't count. They were a good, very trivial first step, but there's so much more that could be done. Why not go further?

      Good question. Why don't you start and submit some code?

    3. By Ray Lai (83.125.32.107) ray@ on http://cyth.net/~ray/

      > Why does _anyone_ care about this crap anymore? Isn't it well known that the standard C library has many very suboptimal routines? Going around and patching their uses and proclaiming with great joy and self-fulfillment how one should properly use them isn't really making useful progress in my opinion.

      I wrote this because people are still using these functions and doing so incorrectly, not to proclaim great joy and self-fulfillment.

      > Why not just start using something better? Write better libraries. Even if they're not standard, so long as it's in the base system, who cares? Software with very impressive security histories (i.e. Postfix) don't expend the trouble trying to be extra careful about these shakey foundations -- they build new ones, better ones.

      > If OpenBSD really cared about proper usage, why wouldn't they go about introducing abstractions and interfaces that are better. The strl* functions don't count. They were a good, very trivial first step, but there's so much more that could be done. Why not go further?

      I agree that nicer, safer interfaces would be nice. The reason they do not exist is not because I prefer functions full of caveats.

  5. By Anonymous Coward (69.207.171.114) on

    It seems to me like an obviously simple solution to this problem is:
    char buf[SIZE];
    FILE *f = whatever  ;
    
    if (fgets(buf, sizeof(buf), f))
    {
      char *p;
      if (p=strchr(buf, '\n'))
        *p = 0;
    }
    Unless there is something I'm not seeing here, this should work fine and seems pretty reasonable to me. It is shorter and IMO more intuitive than the examples given.

    Comments
    1. By Eric Eells (67.10.75.125) eric_eells@yahoo.com on

      Even more reasonable would be using strnchr instead of strchr. Just imagine writing a null byte past the end of the buffer... ouch!

      Comments
      1. By Eric Eells (67.10.75.125) on

        err... nevermind. I see fgets is null terminated if successful.

    2. By Anonymous Coward (66.219.146.142) on

      > It seems to me like an obviously simple solution to this problem is:
      >
      > char buf[SIZE];
      > FILE *f = whatever ;
      >
      > if (fgets(buf, sizeof(buf), f))
      > {
      > char *p;
      > if (p=strchr(buf, '\n'))
      > *p = 0;
      > }
      >
      > Unless there is something I'm not seeing here, this should work fine and seems pretty reasonable to me. It is shorter and IMO more intuitive than the examples given.

      strchr assumes a nil-terminated string; if there are embedded nil's, it will get fucked up.

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]