Contributed by johan on from the new-code-kills-old-bugs dept.
As some of you know I have been working on a new implementation of malloc, the general purpose memory allocator that's used by userland programs. See this link for more details about it.
My malloc has been tested by many and has been in snapshots for a while now. Some time ago I received a bit of a puzzling report from Nikolay Sturm (sturm@) that on sparc64 compiling big C++ projects would sometimes fail with an Internal Compiler Error (ICE). For a start, it was not clear if my new malloc was involved, but I setup my sparc64 machine for testing. It soon turned out I could reproduce the problem. Switching to the in-tree malloc made the problem go away. So I was facing a malloc bug or a gcc bug, I thought.
Otto continues below...
My new malloc has some features not found in the current malloc: it moves allocations smaller than a page but bigger than half a page to the end of the page, to have a greater chance of catching buffer overflows. Even without guard pages, this works most of the time, since we have a randomized mmap(2): there's a pretty big chance that there is a hole after a page allocated. Accessing that hole leads to a segmentation fault.
So I put my malloc back in, and started investigating. The first thing I tried was to switch off the code that moves small allocations: and yes, that made the bug go away. Using an unstripped version of the compiler, I could interpret the backtrace in gdb, and it soon turned out that the compiler was dying in yyparse(). yyparse() is the main function generated by the parser generator yacc(1). After compiling obj/cp/parse.c with debug options and some fighting with wrong line numbers reported by gdb, I saw the offending statement was the second statement below;
It turned out yym was 0, so yyvsp[1] was accessed: one stack entry above the current stack pointer.yym = yylen[yyn]; yyval = yyvsp[1-yym];Yacc has an implicit action that does:
before the action of a rule is done. $$ refers to the result of the rule, and $1 refers to the first item on the right hand side of the parse rule, yym says how many right hand side values of the current rule there are on the stack. The two statement above implement the implicit action.$$ = $1;But if the actions are part of a rule that has no right hand side:
the implicit action will access via $1 a value beyond the top of stack: yym will be 0 in that case. Normally, that isn't a big problem: you're just accessing garbage, and assigning it to $$.A: /* empty */ { foo(); };But if the stack is at maximum size, this will overflow if an entry on the stack is larger than the 16 bytes leeway my malloc allows. In the case of of C++ it is 24 bytes, so a SEGV occured.
Funny thing is that I traced this back to Sixth Edition UNIX, released in 1975. A very similar problem can be spotted here. The lines:
access an item above the stack pointer. If yyr2[n] is zero, this is a potential access outside the stack. Note the archaic use of =-, we write -= these days.yypv =- yyr2[n]; yyval=yypv[1];The bug was only triggered on sparc64, since it uses 8k pages. The default yacc stack size for C++ is 24 * 200 = 4800 bytes, which is larger than a page on most platforms. In this case malloc returns a page aligned object, no moving of the allocation inside a page occurs.
I'm really happy this turned out to be a yacc bug, and not a malloc bug. It's actually a malloc feature in action. ;-)
The commit that fixes this can be found here.
-Otto
(Comments are closed)
By Anonymous Coward (169.244.143.114) on
Comments
By Anonymous Coward (91.21.74.158) on
Cool desktop features in OpenBSD? Maybe you're thinking about FreeBSD or Linux, but OpenBSD is led by security.
Comments
By landry (82.245.140.102) on
>
> Cool desktop features in OpenBSD? Maybe you're thinking about FreeBSD or Linux, but OpenBSD is led by security.
And ? Is it opposed to have cool desktop features ? Come on..
By phessler (phessler) on first undead, then not, then undead again.
>
> Cool desktop features in OpenBSD? Maybe you're thinking about FreeBSD or Linux, but OpenBSD is led by security.
>
>
the trolls are cute today.
By Anonymous Coward (128.171.90.200) on
By Cleber (189.23.20.10) cleber@bsd.com.br on
>
> Cool desktop features in OpenBSD? Maybe you're thinking about FreeBSD or Linux, but OpenBSD is led by security.
>
>
My desktop is OpenBSD, and I love it. IMHO, why not?
By Otto Moerbeek (otto) on http://www.drijf.net
I never felt any pressure like that. I just work on things that I find interesting or things that I need myself.
By yacc newb (66.207.218.21) on
Hmm, wouldn't that lead to garbage in the syntax tree and, if so, wouldn't that cause big problems?
Comments
By Otto Moerbeek (otto) on http://www.drijf.net
It does not lead to seg faults, at least.
In a typical yacc grammar, the actions assign a value to $$, overriding the default action:
You could see something like:
The following would indeed be problematic if the non-terminal A would be used as a value in other rules: because it would assign junk to A. Since the rule has no action, only the default action is left, which is nonsensical for a rule having no right hand side. But values are not always passed up, the side effects of action could just do the work. So you are right, it could lead to problem, but it does not have to do that.By Ted Walther (70.71.224.2) on http://reactor-core.org
Comments
By Otto Moerbeek (otto) on http://www.drijf.net
>
Well, I suspect too much programs will stop working. A simple off by one is so common.
Comments
By Anonymous Coward (78.20.243.221) on
How about some sort of setting that toggles this behaviour? If you can find bugs in 30 year old codes with a reasonably large safety-net, then I'm going to go guess that other interesting bugs might come bubbling to the surface. Or am I missing something painfully obvious?
Comments
By Anonymous Coward (219.90.224.124) on
At the very least, it would be nice to be able to pick this sort of thing up in one's own code.
Comments
By Anonymous Coward (74.14.137.59) on
>
> At the very least, it would be nice to be able to pick this sort of thing up in one's own code.
Theo no like buttons, Theo hate buttons! Rar! Theo smash!
Comments
By Anonymous Tasmanian (150.101.108.45) on
Theo no like buffer overflows...
By Philipp (89.49.66.123) on
awesome :)
otto: GREAT. Like one already said: brilliant
By J.C. Roberts (jcr) on http:www.designtools.org
> >
>
> Well, I suspect too much programs will stop working. A simple off by one is so common.
Hi Otto,
It's frightening to think how many programs would neeed to be fixed, so I can see how it would result in a ton of work if we suddenly break everything with an off by one error, but I am failing to see why breaking them is a bad thing?
Is the goal to initially just get your new malloc in the tree and well tested with similar bug-for-bug compatibility to the old malloc? (i.e. while leaving the option open to change the behavior later on so it can eventually be used to catch/break all the off-by-one errors?)
Or do you think we're stuck with the 16 byte leeway (and off-by-ones) for the long term?
Kudos on the fantastic work!
Thanks,
jcr
Comments
By Otto Moerbeek (otto) on http://www.drijf.net
> Or do you think we're stuck with the 16 byte leeway (and off-by-ones) for the long term?
The problem is that if I make things too strict, so much things wil stop working that you end up with a non-functional system. A general goal is to have as much protection enabled by default. Making it too strict will just be counterproductive.
I have some more plans for the future and a resizable leeway is one of them, but for now the 16 bytes leeway is good.
By Andrés Delfino (200.80.164.16) adelfino@gmail.com on
By Charley Corday (75.138.170.36) ccorday@mailinator.com on
For finding a serious bug dating back to 1975 that is used in scores of projects world wide to this day, you earn the respect of the community as one serious bad ass developer. Kudos to you.
Comments
By Charley Corday (75.138.170.36) ccorday@mailinator.com on
By Janne Johansson (jj) on Old amiga fart.
Tedu@ did the same for pointer-sized allocations somewhere around this
commit:
http://marc.info/?l=openbsd-cvs&m=111811962102430&w=2
That (supposedly ;) catches all mallocs that mistakenly uses sizeof(*foo) instead of sizeof(foo) in the code.
"man malloc.conf" on the J option.
Comments
By Janne Johansson (jj) on .
> "man malloc.conf" on the J option.
>
Bah, I meant "P"
By Matthew Dempsky (2001:470:805a:1:21b:63ff:feca:36df) on
Other way around.
By Hudson (74.58.67.99) hudson@videotron.ca on
Note the archaic use of =-, we write -= these days.
=- is not the same thing as -=.
The first one discards the original term on the left, replacing it with the negative of the term on the right.
The second one decrements the left term by the right term.
See here.
Not enough coffee? ;-)
Comments
By Anonymous Coward (60.241.132.54) on
> =- is not the same thing as -=.
It is when you're talking about old versions of C, where op= was written as =op.
By Damien Miller (djm) on http://www.mindrot.org/~djm/
Don't argue with Otto. He sorts lists in O(1).
Comments
By J.C. Roberts (jcr) on http:www.designtools.org
>
> Don't argue with Otto. He sorts lists in O(1).
>
Please add that statement to fortune.
By Miod Vallat (miod) on
>
> Don't argue with Otto. He sorts lists in O(1).
>
And in reverse order, too.
Comments
By Anonymous Coward (128.171.90.200) on
Help each other out that'll be our motto!
Make this patch I'll give you free gelato!
Then back to my place, where I will get you blotto!
Domo Arigato, Mr. Roboto
By Anonymous Coward (24.181.247.3) on
>
> Don't argue with Otto. He sorts lists in O(1).
>
This definitely needs to go into fortune. It's a part of Unix history.
By Anonymous Coward (82.130.37.84) on
>
> Note the archaic use of =-, we write -= these days.
>
> =- is not the same thing as -=.
This change in syntax was introduced in K&R (the first edition of "The C Programming Language", you know), published 1978. Pinpointing the original public release of the compiler that introduced new syntax, or deprecated the old is harder - but it most likely occured in UNIX V6, V7 or the Progammers Workbench 1.0 released between them.
Anyway, =- *was* valid back in UNIX V6 yacc.
By Marc Espie (213.41.185.88) espie@openbsd.org on
>
> Note the archaic use of =-, we write -= these days.
>
> =- is not the same thing as -=.
>
> The first one discards the original term on the left, replacing it with the negative of the term on the right.
>
> The second one decrements the left term by the right term.
No, you're just too young to know better.
Very, very old versions of C did have =- with those archaic semantics.
By Anonymous Coward (71.255.98.13) on
>
> Note the archaic use of =-, we write -= these days.
>
> =- is not the same thing as -=.
>
Other's have already jumped on you about this, so I'll just point out that, in the old days, decrement-and-assign and add-and-assign were the only operators to require spaces on either side, since otherwise they would be interpreted as assign-the-opposite-of and assign, respectively. Enough people made the error that you are currently being chewed out for that people finally decided to do something about it. By the time of ANSI C, -= had become the standard way, and the new rules prohibited doing it the old, stupid way.
I'm sure there's a gcc option to make it be dumb again if you want to try it out. Try --dumb or --like-everything-else-gnu-does.
By Anonymous Coward (129.132.27.178) on
It is one thing to see OpenBSD improve constantly, but that would be much more boring if it would all happen "under the hood".
I appreciate the time you took to write down this story.
Comments
By asemisldkfj (71.174.176.28) on http://www.thehomerow.net
>
> It is one thing to see OpenBSD improve constantly, but that would be much more boring if it would all happen "under the hood".
>
> I appreciate the time you took to write down this story.
Ditto! I enjoy reading narratives like this.
By David Librik (99.147.4.64) librik@panix.com on
But your patch only corrects the first bug you found. Did you also have a fix for the second one, which you describe as "a very similar problem"? That would be the lines:
yypv -= yyr2[yyn];
yyval = yypv[1];
I'm looking through the parser skeleton of a version of Yacc which is still in use (Abraxas PCYACC) and there they are! So the bug didn't die with V6 UNIX.
Comments
By Miod Vallat (miod) on
>
> But your patch only corrects the first bug you found. Did you also have a fix for the second one, which you describe as "a very similar problem"? That would be the lines:
> yypv -= yyr2[yyn];
> yyval = yypv[1];
> I'm looking through the parser skeleton of a version of Yacc which is still in use (Abraxas PCYACC) and there they are! So the bug didn't die with V6 UNIX.
>
You did not understand him correctly. v6 has the bug you are quoting, then over time yacc was modified and the bug took a different form, which Otto describes earlier in his message. So whichever yacc version you have, it has (well, might have) either form of the bug, but not both.
By Otto Moerbeek (otto) on http://www.drijf.net
Fixing code in versions of yacc outsode of OpenBSD is left as an exercise to the reader.
By TylerEss (69.42.248.90) on
Start with the current amount, and wait until people stop complaining. Then, reduce the leeway by 25% and see who complains... repeat this process and after a few years you might be killing people's off-by-ones en masse...
By Anonymous Coward (24.181.247.3) on
By speedplane (117.47.226.71) mes65@cornell.edu on
Comments
By Alan Chantler (87.115.194.161) alan@chantler.net on
In addition to congratulating Otto on his detective work, and on his willingness to publish the outcome (in true UNIX style), I think it worth saying something in defense of Edition VI.
Way back, when Ken Thompson and Dennis Ritchie produced their OS, it was done from necessity rather than due to commercial pressure. I expect that you all know all this anyway but I have always been impressed by the fact that the whole UNIX universe (interesting use of words given the original cause of the need that led to the OS in the first place) is based upon sharing and peer-review.
There was once available (by Lyons but no longer available - I rather think it was suppressed by AT&T Bell Labs) an annotated version of the kernel code for Edition VI. It attempted to explain the fundamental thinking behind some of the code but I well remember that there was a line of code (written, of course, in 'C' [which itself followed 'B'}) which included the comment "/* You're not supposed to understand how this works!*/".
So let us not forget that, remarkable though Otto's discovery of this so-called 'ancient bug' may be, to my mind what is even MORE remarkable is the fact that there have, in the past nearly forty years, been so FEW such 'bugs' discovered!
Well Done Otto! Keep up the good work!
Comments
By Pierre Riteau (131.254.100.94) on Pierre Riteau
http://cm.bell-labs.com/who/dmr/odd.html
Comments
By Anonymous Coward (122.49.172.127) on
By Anonymous Coward (122.49.172.127) on
Maybe, but how much of the original code is still in the wild?
By Anonymous Coward (77.239.192.101) on
It's not a bug, it's you who've messed things up.
You've used an empty token but added action not knowing that the rule is always assumed to return a value. It's explicitly stated in docs. An empty token has no value at all, so you have to return it in the action.
The same thing happens when your write the rule
A: B {..} C {..};
Yacc actually converts the above statement internally to
TEMP: /* empty */ {..};
A: B TEMP C {..};
And yes, the action in the middle have to assign something to the $$. That's the rule.
Comments
By Otto Moerbeek (otto) on http://www.drijf.net
By Otto Moerbeek (otto) otto@drijf.net on http://www.drijf.net
A lot of posters seem to have missed the point that the bug is in the code generated by yacc. That code can end up anywhere, so the potential impact of this bug is hard to tell. I suspect it is low: it's a read buffer overflow after all.
The bug I fixed is in the Berkely version of yacc. The original 1975 bug is very similar. While Berkely yacc is a complete rewrite, it shows that some bugs really know how to survive. Contrary to some headlines I saw, I did not actually fix the 1975 bug. I fixed the Berkeley yacc bug and showed it has very old roots.
The actual problem is the read access of $1. The code adjust the stack pointer using the length ot the right hand side of the rule being reduced. After that, $1 is 1 up the stack pointer. The implict $$ = $1 rule is executed before any processing of user written actions, so if the stack is at it maximum allocted size, the out of bounds read access occurs.
The stack I'm talking about is of course the yacc parser runtime stack, and not the system stack.
In theory the bug could have been found by using memory access debuggers. But it has not. This is probably due to the fact that it is pretty rare: you have to compile a program that uses a certain grammar rule when the yacc stack is at its maximum size. IMO, this all shows the value of te various techniques we use: if possible, we enable them by default, so we have a better chance to catch rare errors.
About making malloc stricter: now that my new malloc is committed, I will start working on some more improvements. Adjustable end of allocation alignment is one of them. BTW, currently the feature is completely switched off, and you can enable it by using the P malloc option.