Contributed by rueda on from the unlocked-and-unloaded dept.
Alexander Bluhm (bluhm@
) has
committed
changes which eliminate contention
by caching the socket lock in TCP input:
CVSROOT: /cvs Module name: src Changes by: bluhm@cvs.openbsd.org 2025/05/07 08:10:19 Modified files: sys/net : if.c if_var.h sys/netinet : tcp_input.c tcp_var.h Log message: Cache socket lock during TCP input. Parallel TCP input is running for a few days now and looks quite stable. Final step is to implement caching of the socket lock. Without large receive offloading (LRO) in the driver layer, it is very likely that consecutive TCP segments are in the input queue. This leads to contention of the socket lock between TCP input and socket receive syscall from userland.
With this commit, ip_deliver() moves all TCP packets that are in the softnet queue temporarily to a TCP queue. This queue is per softnet thread so no locking is needed. Finally in the same shared netlock context, tcp_input_mlist() processes all TCP packets. It keeps a pointer to the socket lock. tcp_input_solocked() switches the lock only when the TCP stream changes. A bunch of packets are processed and placed into the socket receive buffer under the same lock. Then soreceive() can copy huge chunks to userland. The contention of the socket lock is gone. On a 4 core machine I see between 12% to 22% improvement with 10 parallel TCP streams. When testing only with a single TCP stream, throughput increases between 38% to 100%. tested by Mark Patruck a while ago; OK mvs@
As was at least to some extent clear from the previous articles, this leads to noticeable performance improvements in the areas mentioned, which makes the system even more fun to use. Enjoy!