Contributed by Peter N. M. Hansteen on from the testing needed, em(4)phatically dept.
em(4)
network interfaces?
Darren Tucker (dtucker@
) has a new diff out that may be of use to you,
posted in a
message
to tech@
:
List: openbsd-tech Subject: em(4) TX interrupt mitigation From: Darren Tucker <dtucker () dtucker ! net> Date: 2025-05-19 8:52:13 Hi. TL;DR: if you use em(4), particularly on a low-power device such as a pcengines APU2, please try this diff. The em(4) driver has 5 interrupt mitigation timers[0].
In each direction there's a "Packet Timer" that is reset each time a packet is processed, and an "Absolute Timer" that is reset each time in interrupt happens. The Packet Timer lets it wait a little while for another packet, but the Absolute Timer makes sure it doesn't wait too long. In OpenBSD's em(4), these values are (in approximately microseconds): Transmit Packet Timer (EM_TIDV) = 64 Transmit Absolute Timer (EM_TADV) = 64 Receive Packet Timer (EM_RDTR) = 0 Receive Absolute Timer (EM_RADV) = 64 You will note that the Receive Packet Timer is set to zero, so it will generate an interrupt for each packet. This also means that the corresponding Absolute Timer is also effectively disabled. There's a comment that says "CAUTION: When setting EM_RDTR to a value other than 0, adapters may hang (stop transmitting) under certain network conditions." We'll examine that one later. There's also an "Interrupt Throttle Timer" (ITR), which is set (DEFAULT_ITR) to only allow a maximum of ~8000 interrupts per second, which is consistent with what you seem in "systat vm 1" on a fully loaded interface. Since it's at that limit, it would seem that interrupt rates are a limiting factor. The interrupt handler processes both TX and RX regardless of the source of the interrupt. Looking at the TX interrupt mitigation, the value of 64 seems to have come from the FreeBSD driver in 2002[1] where TIDV was reduced from 128 to 64 and TADV was added. How many packets can happen in 64 usec? At 1Gb, a 1500 byte packet plus its overhead takes (1538*8)/1e9 seconds = 12.3 usec, so about 5. But wait, em(2) supports jumbo packets, which would take (9254*8)/1e9 = 74 usec! Since this is more than the maximum holdoff timer, it means we're taking a TX completion interrupt for every jumbo frame sent. The TX ring holds 256 or 512 packets depending on NIC model, so we're not making very effective use of it. What can we increase this to? Well the worst case would seem to be back-to-back transmission of minimum size (64byte) packets at 1Gb/s while also receiving nothing. Each packet takes about 0.8 usec, so if we want to make sure the interface never runs out of packets to transmit we we need to refill the ring before it's completely empty. 220 should just fit 3 jumbo packets while still leaving a little headroom. Note that actually sending traffic while receiving absolutely nothing is difficult to acheive in practice, since there will likely be replies and various other traffic. In my testing with iperf an APU2 with TSO disabled and hw.setperf=0, I see RX go up ~10% (334Mb/s -> 362Mb/s), TX go up ~25% (600Mb/s -> 750Mb/s), and CPU usage go down by ~60% (nearly 100% of 1 core down to ~40%). With hw.setperf=100 the speed doesn't change much, but the CPU goes down by about the same amount. Comments and test reports welcome. [0] https://www.intel.com/content/dam/doc/application-note/gbe-controllers-interrupt-moderation-appl-note.pdf [1] https://github.com/freebsd/freebsd-src/commit/a58e485d
and the rest of the message has the diff (against -current
) that Darren would like your feedback on.
So here's your chance to contribute back to our favorite operating system. If you are able to test this, please go ahead!