Call for testing: em(4) TX interrupt mitigation

Contributed by Peter N. M. Hansteen on 2025-05-19 from the testing needed, em(4)phatically dept.

Are you an OpenBSD user with a low power device such as a PC Engines APU2, with one or more em(4) network interfaces?

Darren Tucker (dtucker@) has a new diff out that may be of use to you, posted in a message to tech@:

List:       openbsd-tech
Subject:    em(4) TX interrupt mitigation
From:       Darren Tucker <dtucker () dtucker ! net>
Date:       2025-05-19 8:52:13

Hi.

TL;DR: if you use em(4), particularly on a low-power device such as a
pcengines APU2, please try this diff.

The em(4) driver has 5 interrupt mitigation timers[0].

In each direction there's a "Packet Timer" that is reset each time a
packet is processed, and an "Absolute Timer" that is reset each time
in interrupt happens.  The Packet Timer lets it wait a little while
for another packet, but the Absolute Timer makes sure it doesn't wait
too long.

In OpenBSD's em(4), these values are (in approximately microseconds):

Transmit Packet Timer (EM_TIDV) = 64
Transmit Absolute Timer (EM_TADV) = 64
Receive Packet Timer (EM_RDTR) = 0
Receive Absolute Timer (EM_RADV) = 64

You will note that the Receive Packet Timer is set to zero, so it
will generate an interrupt for each packet.  This also means that the
corresponding Absolute Timer is also effectively disabled.  There's a
comment that says "CAUTION: When setting EM_RDTR to a value other than 0,
adapters may hang (stop transmitting) under certain network conditions."
We'll examine that one later.

There's also an "Interrupt Throttle Timer" (ITR), which is set
(DEFAULT_ITR) to only allow a maximum of ~8000 interrupts per second,
which is consistent with what you seem in "systat vm 1" on a fully loaded
interface. Since it's at that limit, it would seem that interrupt rates
are a limiting factor.  The interrupt handler processes both TX and RX
regardless of the source of the interrupt.

Looking at the TX interrupt mitigation, the value of 64 seems to have
come from the FreeBSD driver in 2002[1] where TIDV was reduced from
128 to 64 and TADV was added.  How many packets can happen in 64 usec?
At 1Gb, a 1500 byte packet plus its overhead takes (1538*8)/1e9 seconds
= 12.3 usec, so about 5.  But wait, em(2) supports jumbo packets, which
would take (9254*8)/1e9 = 74 usec!  Since this is more than the maximum
holdoff timer, it means we're taking a TX completion interrupt for every
jumbo frame sent.  The TX ring holds 256 or 512 packets depending on
NIC model, so we're not making very effective use of it.

What can we increase this to?  Well the worst case would seem to be
back-to-back transmission of minimum size (64byte) packets at 1Gb/s while
also receiving nothing.  Each packet takes about 0.8 usec, so if we want
to make sure the interface never runs out of packets to transmit we we
need to refill the ring before it's completely empty.  220 should just fit
3 jumbo packets while still leaving a little headroom.  Note that actually
sending traffic while receiving absolutely nothing is difficult to acheive
in practice, since there will likely be replies and various other traffic.

In my testing with iperf an APU2 with TSO disabled and hw.setperf=0, I see
RX go up ~10% (334Mb/s -> 362Mb/s), TX go up ~25% (600Mb/s -> 750Mb/s),
and CPU usage go down by ~60% (nearly 100% of 1 core down to ~40%).

With hw.setperf=100 the speed doesn't change much, but the CPU goes down
by about the same amount.

Comments and test reports welcome.

[0] https://www.intel.com/content/dam/doc/application-note/gbe-controllers-interrupt-moderation-appl-note.pdf
 [1] https://github.com/freebsd/freebsd-src/commit/a58e485d

and the rest of the message has the diff (against -current) that Darren would like your feedback on.

So here's your chance to contribute back to our favorite operating system. If you are able to test this, please go ahead!

(Comments are closed)

Comments

By Will (fishy) undeadly@mail.xhci.com on 2025-05-21 16:47

> After adding some instrumentation to my "baseline" kernel and rebuilding
> it, I saw the difference between baseline and my diff more or less vanish.

> I repeated the test with the original baseline kernel, which was from
> snapshot from a few days ago (OpenBSD 7.7-current (GENERIC) #657: Sun May
> 18 01:28:04 MDT 2025) and saw the difference again.

> It would seem that slower speed and higher CPU usage in the baseline was
> actually something that was broken in that snapshot and fixed the following
> day.

> Apologies for the noise. If you're running with my diff it should work
> fine, but you probably won't see the gains I claimed :-(

<a href="https://marc.info/?l=openbsd-tech&m=174779425314684&w=2">https://marc.info/?l=openbsd-tech&m=174779425314684&w=2</a>

Latest Articles

Fri, Jul 18
- 07:24 When Root Meets Immutable: OpenBSD chflags vs. Log Tampering (1)
Thu, Jul 17
- 10:33 stdio(3) change: FILE is now opaque (0)
- 06:19 Font caching no longer runs as root (0)
Fri, Jul 11
- 09:15 watch(1) utility added to -current (0)
Sat, Jul 05
- 08:17 KDE Plasma 6.4 has landed in OpenBSD (0)
- 08:13 Blink and you'll miss it! 4096 colours and flashing text on the console! (2)
- 08:08 Game of Trees Hub now taking signups for repository hosting (0)
Sat, Jun 28
- 05:57 Game of Trees 0.115 released (0)
Tue, Jun 24
- 07:48 Game of Trees 0.114 released (0)

Credits

Copyright © 2004-2008 Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to April 2nd 2004 as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]