OpenBSD Journal

p2k23 Hackathon Report: Landry Breuil (landry@) on chasing memory corruptions

Contributed by rueda on from the birds do thunder dept.

Next up in the series of p2k23 hackathon reports is this from Landry Breuil (landry@), who writes,

It's been a while since the last p2k19 in bucarest… and this time in a new place, city, country, lovely ireland with a lovely weather at this time of the year.

As usual, i wanted to play with things that were left on the side for a while (upgrading mail/stalwart stack to the new all-bundled-in-one layout to play with JMAP… or testing matthieu@'s work on wayland) - but i was of course mostly distracted from those interesting topics by …firefox, you guess it. Dammit, not again !

Shortly before the hackathon, i had tested the first beta of the 118 upcoming branch, which was crashing at startup. Had left it aside for the hackathon, and oh boy i was in for a treat…

Trying to analyze what was happening, ktrace wasn't helping much, no luck about a pledge/unveil violation or a w|x violation. Once debug-firefox package was installed, i could look at the coredumps, which always gave different traces, all seemingly pointing to a memory corruption somewhere… but where ?

First, upgraded devel/gdb to the latest version, courtesy of a diff from sthen@ which gives a much more useful gdb version that the one we have in base and ports so far… that was definitely helpful!

So something was blowing at startup in 118, and 117 was working fine. Must be a regression right ? From which bazillions of commits ? Well, bisecting all commits wasn't really appealing, so i tried various things…

First, build from a clean source checkout of the mozilla-central/trunk branch. interesting, this one runs fine ! so… an issue in the beta branch ? Nope, no dice, build from a source checkout of the mozilla-beta branch ran fine too. Oh, and to get there, that's still vastly because upstream source tree builds fine unpatched on OpenBSD, thanks to countless hours upstreaming patches…always grateful to past myself.

So start comparing build options between the port and my trunk builds.. many different options (with system icu ? with system nss ? build optimizations ? MOZILLA_OFFICIAL ?) which lead to many builds, all unsuccessful… until i tried a build from mozilla-beta with --disable-dbus like in the port. Bam, this one blew at startup. So… a regression from a dbus-related change in 118 ? The regression window was definitel narrowing ! Looking into the source tree for all files having #ifdef MOZ_ENABLE_DBUS codepaths that had changes in this development window, i ended up with 4 suspect commits, and after bisecting, found out that https://hg.mozilla.org/mozilla-central/rev/8508522a874d was the commit introducing the regression, only in the --disable-dbus case ? How… twisted.

After starting at the code for hours and ETOOMANY builds, didn't find what the issue was in the commit, but all the findings were reported in an upstream bug report (read all the comments to figure out how many hypothesis were tried…) so that the developer having commited it could have a look. At least i didn't have spend those 3 days chasing this bug for nothing… and of course, kudos goes to our strict malloc for detecting this write after free !

(spoiler alert: the upstream developer has proposed a patch since then, and it totally fixes the regression, so we'll get a working 118 in time for the release.)

Also had to spend a bit of time on all mozilla ports to make them aware/compatible with robert@'s work on splitting devel/llvm… but that was quite easy in the end, that had me diving in a bit of llvm, see for example bug 1852202 about -fstack-clash-protection or bug 1851301 about simd-everywhere. Now the next step will be trying to build firefox with the future (llvm 16).

Well, there goes my hackathon ? Nah, could still find some time to build matthieu@'s work, and in less than an hour i had a working Wayland session running on my t495s. all i needed to do was to build and install all the ports under the wayland/ subdirectory - i swear it ! Ok, maybe sway crashes sometimes, there's probably many missing bits, but that's a really promising start !

Of course, that wasn't an innocent try… i wanted to try firefox running natively on wayland, and not through xwayland. Long story short, it doesn't run yet, but all the boring details are figured out for adventurous ppl who want to help (patches available of course !):

  1. install wayland
  2. rebuild gtk+3 with wayland enabled
  3. rebuild firefox so that wayland support is enabled
  4. within a wayland session, try running firefox with MOZ_ENABLE_WAYLAND=1 in the env
  5. profit^Wcollect crashes

Just jokin', but i've already some patches to make it progress, and this is being as usual discussed upstream - more to come at the next hackathon !

All in all… quite a busy hackathon. Didn't have much stamina left to spend evenings in the pubs, but i had the occasion to visit downtown in good company ! It was also a pleasure to meet new developers face-to-face, and see others i hadn't seen in a while…

All this thanks to Tom Smyth, who took great care of organizing everything for us there - and this thanks also extends to the OpenBSD Foundation & NCSC Ireland who sponsored the event. Those hackathons are definitely what helps the project progress !

Thanks for the work and the report, Landry!

The code here should be fetchable in snapshots, and for those with a bit more patience, also available in the upcoming release.

(Comments are closed)


Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]