k2k20 hackathon report: Klemens Nanni on network land decluttering

Contributed by rueda on 2020-09-21 from the hotwired or notwired dept.

Our next k2k20 report comes from Klemens Nanni (kn@):

I'd been looking forward to k2k20 just like my other hackathon with its unique atmosphere where getting work done in fact means holiday hacking with friends.

There was nothing big on my list but it had already grown into a rich assortment of issues and itches to scratch - this usually aligns well with the release cycle since it means focusing on regression fixes and polish during the -beta phase until the tree gets locked for release.

Thus I started my first day with a report from bluhm@ out of his excellent test cluster about certain tests failing due to a libpcap fix from dlg@ and me for a tcpdump bug report from matthieu@; bluhm@ spotted the supposed regression promptly after the commit in late july already but I postponed investigation until k2k20 to allocate enough time and brain cells for it.

Luckily trying to reproduce alone revealed cause of it: failure occurs on amd64 (my notebook) and i386 (regress cluster) machines, but not on sparc64 and octeon which I used back when fixing pcap-filters on `DLT_LOOP` links. Any guess? Thoughts about `DLT_LOOP` specifics? TL;DR: Big endian worked, little endian failed - more precisely, filters were broken on machines where host byte order differs from network byte order (big endian).

Turns out dlg@'s last fix merely revealed an always missing byte order conversion causing all filters applied on links such as wg(4), lo(4), gre(4) and more to mismatch. The fix was trivial and confirmed by a commit to libpcap's upstream made years ago, hence we adopted the commit and I happily popped one item off my list.

In between I got side tracked by trunk(4) letting me DOWN while testing unwind(8) diffs for florian@; sure enough those side tracks led to past stories from sthen@ on the mailing list as well as hallway discussions over coffee about MAC address handling in the kernel, switch behaviour wrt. bogus links and ultimately more side tracks…

Meanwhile I had also finished one next small step in an important group effort sometimes referred to as "[removing] ze big lock": documenting data protection in pppoe(4) to help audit and reduce `KERNEL_LOCK()` usage (in the network stack). For a few days on and off I went through various code paths trying to grok the twisted situation of contexts, interface queues, single data structures being protected by multiple locks, the concept of "safe memory reclamation" and much more!

mpi@ is of great help asking the right questions while answering mine and claudio@ was so kind to talk me through the SMR(9) API during lunch.

So far, machines have been running with experimental diffs unlocking pppoe(4), and yet while I still lack confidence^Wexpertise to submit to tech@ in this area, progress is made in that many of the network stack's intricacies become clearer to me.

Those heavier topics put aside, tobhe@'s tinkering next to me turned into delightful thoughts about sensor reading and heuristics, finding the right spot and getting actual work done (or not).

He also helped me polish systat's pf view before I went on with documentation improvements regarding ldom.conf(5/sparc64)'s vdisk and vcpu semantics on which stsp@ prodded me a few days before k2k20.

With my mind in sparc64/LDom land again, I asked kettenis@ to look at an issue with a past attempt of mine to improve interface handling for guest domains. He quickly mentioned the hypervisor's firmware often being picky about custom values, which is what had tried to add, i.e. add extra nodes to the primary domain's machine description enriching each guest domain's network interface information. I dug through official documentation and kettenis@ further elaborated on 'Virtual Switch' implementation differences between OpenBSD's (his) vsw(4/sparc64) and Sun's Solaris driver. Given this, we concluded that my initial idea would more likely than not result in a rather unusual design (device layout/attachment); thus I gladly abandoned my idea (for now) while taking away quite a few insights.

Just in time to return back to networking land, yasuoka@ mailed a simple diff to fix rtable(4) handling in pfctl(8). At the same time it also prompted a broader discussion about rdomain(4) handling in pf(4), how existing behaviour could be fully dynamic (at no cost) and how routing tables need to more careful handling opposed to routing domains.

bluhm@ chimed in the next day and we went through pf(4)'s handling of interfaces and interface groups where pf.conf(5) semantics are similar to what we modeled for routing domains/tables: Rules may specify a nonexistent interface (group), e.g. `pass on mygroup` or `block on gre42`. The kernel provides hooks into pf for interface changes such that it can maintain a list of available interfaces and groups to filter on.

That is, the packet filter does not need to know about used interfaces at the time of ruleset creation, i.e. rulesets do not have to be reloaded whenever the network got rewired.

Routing tables and domains differ from interfaces and groups in that they do _not_ require such hooks for pf to work, but they are analogous in use: no need for them to exist at ruleset creation as the kernel can always do the right thing at runtime.

See the linked thread for discussion and rest assured that these plans won't be rushed in before release - such design changes are best to done at the start of a new test^Wdevelopment cycle.

The hackathon was filled with various other discussions, laughter and interesting insights from fellow hackers; we've had cosy hours around log fire in the castle, went out together to enjoy local cuisines and let code be code. It felt good spending time with the group and be reminded of how everyone is in for the fun.

Thanks to Burg Liebenzell's staff for such a delightful stay, delicious food and the amazing view into the valley every day. Thanks to Genua GmbH and jan@ for hosting and running this lovely event. Thanks to my employer siticom GmbH for supporting me.

Many thanks Klemens!

(Comments are closed)

Latest Articles

Sat, Jul 27
- 11:05 UDP parallel input committed to -current (0)
Wed, Jul 24
- 05:28 Incoming: UDP parallel input (0)
Sun, Jul 21
- 16:54 Libva's VA-API (Video Acceleration API) imported into xenocara (3)
Sun, Jul 14
- 15:42 Enable local-to-anchors tables in PF rules (0)
Thu, Jul 11
- 16:14 Game of Trees 0.101 released (0)
Sat, Jul 06
- 08:46 A practical guide to VPNs, IPv6, routing domains and IPSEC (5)
Wed, Jul 03
- 13:03 clang -fret-clean on the horizon for OpenBSD/arm64 (0)
Mon, Jul 01
- 10:28 OpenSSH 9.8 released (2)
- 05:54 RIP dhclient(8) (1)

Credits

Copyright © 2004-2008 Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to April 2nd 2004 as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]