OpenBSD Journal

p2k17 Hackathon report: Florian Obser on network stack progress, kernel relinking and more

Contributed by Peter N. M. Hansteen on from the relink don't go poof dept.

A new p2k17 hackathon report has arrived, this one from Florian Obser, who writes:
One of these days I should probably just put "lives here" into the file on cvs when it comes to hackathons in Berlin. My very first hackathon was b2k13 and I attended u2k15 as an emergency hackathon, on to number 3!

I arrived a day earlier in Berlin to catch up with friends with dinner and beer and I was bearing gifts.

The next day benno and I arrived around 10 o'clock at in-berlin and set up shop in the side room with the other lepers^Wnetwork hackers.

The evening before benno and I were musing why the ip-transparent option of nsd(8) and unbound(8) doesn't work on OpenBSD and what it would take to port the functionality. It turned out that what FreeBSD calls IP_TRANSPARENT and Linux calls IP_FREEBIND OpenBSD calls SO_BINDANY. The diff was trivial and quickly OK'ed by benno and jca. It took longer to submit it upstream...

This option is useful when nsd(8) or unbound(8) are configured with a specific IP but that IP is not yet configured on any interface when the daemon comes up. This might happen when configuring IPs with dhclient(8) or slaacd(8).

Since some time I was toying with the idea of moving IPv6 link local address generation out of the kernel into user land. There are four places where link local addresses are generated in the kernel:
1) When IPv6 is enabled on an interface.
2) When the mac address changes on an interface.
3) carp(4)
4) sppp(4)
I have no idea what sppp(4) is doing so I ignored that. One and two are the things I wanted to work on, that leaves carp(4). I wasn't quite sure if carp(4) only changes the mac when ifconfig(8) changes parameters on the carp(4) interface or also when there are state changes. I wandered in there and quickly dragged benno after me. After some arguing, code reading and double checking the man page and general figuring out what the thing was actually doing we convinced ourselves that yes indeed, the mac only changes when ifconfig(8) re-configures the interface. So far, so good but this was painful enough that I wanted to work on something else and so I shelved the idea for the time being.

mpi has this list of things to work on to make the network stack MP safe. Kinda like Hilbert's 23 problems. Only presumably simpler. I picked one of them: move PRU_DETACH out of pr_usrreq and into individual per protocol functions. With mpi's guidance I poked at it and the whole building came crashing down on me. With a bit of code shuffling I could in the end remove sys/net/raw_cb.c completely. With that sys/net/pfkeyv2.c and sys/net/rtsock.c were now sufficiently disentangled that I could add sizes to free(9). Previously the size was unknown.

A day before the hackathon I had picked up another item from mpi's list: move the NET_LOCK down into the various pr_slowtimo and pr_fasttimo functions. It then became obvious that ip_slowtimo() which deals with fragments didn't need the NET_LOCK since it uses a mutex. I set out to convert IPv6 fragment handling to the same style but mpi pointed out that visa already had a more comprehensive diff. So I withdrew mine. However while working on it I had noticed that tedu had removed pr_drain functionality back in 2006 but left behind some functions in the network stack. A simple diff killed off the stragglers which meant less code that needs mutex protection.

I looked around for more things to delete and sys/netinet6 is the gift that keeps on giving. With the switch to slaacd(8) the functions to parse router solicitations and router advertisements in the kernel were only interested in Source Link-Layer Address Options to update the nd6 link layer cache. In that regard router advertisement and router solicitation ICMPv6 messages are very similar. The header is different but the options were parsed the same. A great opportunity to merge the two functions into one and lose half the code in the process!

mpi pointed out that there is more stuff to delete in nd6_rtr.c, namely rt6_flush() and rt6_deleteroute() which are big sores, but unfortunately we found that ICMPv6 redirects still needed that code.

With so much kernel development I was bound to hit a kernel panic eventually. Annoyingly I triggered one after I made it to user land. Here is what happened: I happily hacked away in the kernel and did a make install in sys/arch/amd64/compile/GENERIC.MP. This gave me a backup copy of my currently running kernel in /obsd and installed the new kernel in /bsd. It does this in a smart way with hard links and mv(1). It also updates the link kit in /usr/share. I rebooted and the system came back up. rc(8) started and hard-linked /bsd to /bsd.booted. (The boot loader uses this name when it detects an unhibernate.) At the end of rc(8) /usr/libexec/reorder_kernel got called which used the link kit to re-link the kernel and used the same Makefile target as the make install I used to install my kernel. It hard-linked /bsd to /obsd, copied the new kernel to /nbsd and then did a mv /nbsd /bsd. Now frag6_slowtimo() got called. And the kernel panicked because of a stupid typo I made. Big whoop. We have a backup kernel to boot. Or do we? This used to be /obsd but since reorder_kernel used the same Makefile target as the manual make install /obsd was just a randomization of the kernel that was currently running. It would crash in exactly the same place.

Of course I got easily out of this nose dive by booting bsd.rd and fetching a snapshot kernel. Still annoying and against my muscle memory.

I complained and explained my annoyance loudly and at least benno and phessler agreed that they also use /obsd as a known good kernel. There are of course other ways. Some developers copy a known good kernel to /bsd.good or something like that.

rpe heard our discussion and wandered over. He explained that this had come up before but was rejected probably because we need the backup if the re-linking exposes a bug and crashes the kernel. I was about to drop this but it was still nagging. So I had another look and noticed that /bsd.booted and /obsd were the same kernel. They had the same inode! bsd.booted had been introduced a few weeks after kernel re-linking, but this means kernel re-linking no longer needed obsd as backup, it already had bsd.booted for this purpose. A bit of shuffling around of Makefile targets with hand holding by tb and I had a proposal. It was also nagging rpe and we both arrived at the same conclusion independently. I adapted the diff for all architectures, we got deraadt on board, I committed the diff and there was great rejoicing.

Thanks to stsp, uwe, benno, the foundation, in-berlin and hostserver and everybody else for making another successful hackathon happen in Berlin.

(Comments are closed)


Comments
  1. By Blake (2a01:e34:ec06:8f90:cabc:c8ff:fedb:4d83) on l33.fr

    Fantastic news.

    We really appreciate all these reports as it provides a better understanding of how the system works.

    Comments
    1. By AussieFrog (114.75.73.19) on

      Nice touch: the developers have even prepared a dedicated website at slaacd.com.

      Comments
      1. By Kaliszad (91.49.51.46) on

        That is some Sri Lanka Association of Dentists or what...

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]