Contributed by Peter N. M. Hansteen on from the if ifq_restart, duh dept.
stsp@
) asks for testing of a potential fix
for a problem affecting a number of network interface drivers
(namely bge
,
bnx
,
iavf
,
igc
,
ix
,
ixl
,
ngbe
and
pcn
),
pointing to a
message
on tech@
with the subject
bge/bnx/iavf/igc/ix/ixl/ngbe/pcn: ifq_restart() fix that reads
List: openbsd-tech Subject: bge/bnx/iavf/igc/ix/ixl/ngbe/pcn: ifq_restart() fix From: Stefan Sperling <stsp () stsp ! name> Date: 2025-06-20 10:12:14 A bug has been fixed by yasuaok@ in vmx(4) where the driver was calling ifq_restart() without actually having made any space on a full Tx ring. Calling ifq_restart() in this case can lead to a condition where the interface gets stuck in OACTIVE until the interface is reset with ifconfig.
I could trigger the same bug on ice(4) with iperf as follows: for i in `seq 5`; do (iperf -l0 -t 0 -c 2001:db8::1 -u &) ; done This needs another machine which runs iperf -u -s (and -V if using IPv6). I have checked all ethernet drivers in dev/pci which call ifq_restart(). A few already avoid calling ifq_restart() unnecessarily. The same bug affects bge, bnx, iavf, igc, ix, ixl, ngbe, and pcn. The bad pattern looks like: while (cons != prod) { descriptor = &ring[cons]; if (descriptor is not done) break; free descriptor; } if (ifq_is_oactive(ifq)) ifq_restart(ifq); If we break out of the loop without freeing any descriptor, we will call ifq_restart() without having made any space on the ring. For example, this can happen when drivers invoke both Tx and Tx interrupt handlers during the same MSIX interrupt cause, An Rx interrupt can then lead to processing of a Tx ring which has made no progress. Fixed code looks like: done = 0; while (cons != prod) { desc = &ring[cons]; if (desc is not done) break; free descriptor; done = 1; } if (done && ifq_is_oactive(ifq)) ifq_restart(ifq); I am looking for tests and OKs. I myself don't have any hardware for these drivers to test with.
followed by the patch against -current
with the code
that hopefully fixes the problem.
If you are in a position to test, please do, report back and help Stefan get this into the tree with any adjustments needed!