OpenBSD Journal

g2k16 Hackathon Report: Alexander Bluhm on mbufs and more

Contributed by rueda on from the guten Morgan, England dept.

Alexander Bluhm (bluhm@) contributed our next report (which even includes a picture):

Big plans what to do at a hackathon don't work. There is always something unexpected that requires your attention. So I was expecting the unexpected. The TCP send performance has dropped to a very low throughput in some environments. It was pretty clear that it was related with claudio@'s change to speed up TCP by using large mbufs instead of chaining small ones. Mbufs are used inside the kernel to hold network data. Using a lot of them requires many allocations and frees. This can be avoided by using larger mbufs. But why did it get slower? Together with mikeb@ we found out that it was related to the mbuf space limit in the socket buffer. One large mbuf filled the send buffer, so no new mbufs could be inserted until TCP received the acknowledgements for everything. So the sliding window algorithm with mbufs cycling through the socket buffer did not work anymore. After identifying the problem, the fix was easy, just increase the default socket buffer mbuf size limit.

This example shows that all the numbers in the network stack have to fit together. For OpenBSD some of them have been chosen in the BSD 4.4 era and do not match the high bandwidth networks with high latency of today. Especially the TCP send and receive buffers were too small to allow high throughput. Increasing them will need more mbufs, so there is the risk of hitting the global mbuf limit. Without mbufs available, the network stack will stop processing packets. When increasing the mbuf limit, the kernel may run out of kernel memory which might result in a crash. But nowadays machines have much more memory available. So I switched the default socket buffer size from 256 KB to 2 MB. netstat -a shows the current number of octets in use in the Recv-Q and Send-Q column. I also adjusted the mbuf limits for the different cluster sizes, netstat -m shows some statistics. Running with these limits in production will give us feedback whether the new numbers are reasonable. Note that the pf OS fingerprint of OpenBSD has changed as a larger window scale factor is announced.

There was some discussion how the network stack should work on multiple CPUs. Especially integrating the local TCP stack needs some effort as it assumes that each generated packet is processed atomically. So there must be some locking around it, mpi@ has started to implement it. It helps a lot when all relevant people are around.

To find regressions in our code early, I have setup some systems that run all tests in /usr/src/regress every day. The machines are automatically installed from snapshots before running the tests. The result is shown as a table on a web page. I used the hackathon to get feedback from the other developers and to improve things.

Finally I hacked a little bit on syslogd and implemented validation of TLS client certificates.

bluhm@ and patrick@ in a Morgan sports car in England
bluhm@ (L) and patrick@ (R) in a Morgan

As I was in Great Britain for the first time, I wanted to see a bit of the country side. So I hired a real British Morgan sports car and drove through England together with patrick@ for a week.

Thanks very much, Alexander!

(Comments are closed)


Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]