OpenBSD Journal

Testing parallel forwarding

Contributed by Paul 'WEiRD' de Weerd on from the Mister pushing packets speedily dept.

Hrvoje Popovski writes in with some result from his performance tests, like he did a few years ago:

I've tested Alexander Bluhm's (bluhm@) parallel ip forwarding diff and i've got some nice results. Readers should be aware that bluhm@'s diff sets NET_TASKQ=4 which means that forwarding will use 4 CPU threads and that this diff will affect only network cards that have multiqueue support (at the time of writing those cards are ix(4), ixl(4), and mcx(4). In my tests I was sending 14Mpps UDP packet over ix(4) interfaces which have 16 queues:

ix0 at pci10 dev 0 function 0 "Intel 82599" rev 0x01, msix, 16 queues
ix1 at pci10 dev 0 function 1 "Intel 82599" rev 0x01, msix, 16 queues

OpenBSD box is Supermicro AS-1114S-WTRT with 24 x AMD EPYC 7413 24-Core Processor, 2650.37 MHz CPUs so this box is nice to test those 16 queues.

And here are results:

plain forwarding

NET_TASKQ = 1 - 1.1 Mpps
NET_TASKQ = 4 - 3.4 Mpps
NET_TASKQ = 8 - 2.4 Mpps
NET_TASKQ = 12 - 1.5 Mpps
NET_TASKQ = 16 - 1.7 Mpps
NET_TASKQ = 24 - 1.4 Mpps

plain forwarding with pf - 1M states

NET_TASKQ = 1 - 550 Kpps
NET_TASKQ = 4 - 1.4 Mpps
NET_TASKQ = 8 - 1.9 Mpps
NET_TASKQ = 12 - 1.6 Mpps
NET_TASKQ = 16 - 1.6 Mpps
NET_TASKQ = 24 - 1.5 Mpps

veb(4)

NET_TASKQ = 1 - 1.25 Mpps
NET_TASKQ = 4 - 4.6 Mpps
NET_TASKQ = 8 - 4.7 Mpps
NET_TASKQ = 12 - 5 Mpps
NET_TASKQ = 16 - 4.2 Mpps
NET_TASKQ = 24 - 6.5 Mpps

tpmr(4)

NET_TASKQ = 1 - 1.5 Mpps
NET_TASKQ = 4 - 4.8 Mpps
NET_TASKQ = 8 - 4.1 Mpps
NET_TASKQ = 12 - 4.3 Mpps
NET_TASKQ = 16 - 3.7 Mpps
NET_TASKQ = 24 - 5.5 Mpps

bridge(4)[1]

NET_TASKQ = 1 - 600 Kpps < - sending 700 Kpps
NET_TASKQ - 4 - 800 Kpps <- sending 900 Kpps
NET_TASKQ = 8 - 600 Kpps <- sending 700 Kpps
NET_TASKQ = 12 - 480 Kpps <- sending 600 Kpps
NET_TASKQ = 16 - 480 Kpps <- sending 600 Kpps
NET_TASKQ = 24 - 400 Kpps <- sending 500 Kpps

1 bridge behaves differently which means that if I send 14Mpps, bridge is dead. So I needed to pinpoint around 100Kpps over what bridge can forward to get highest pps.

Many thanks to Hrvoje for the write-up and for doing all these tests, and of course to Alexander Bluhm, Alexandr Nedvedicky and others developers for working on parallelizing the network stack.

(Comments are closed)


Comments
  1. By n/a (Cabal) on

    Very cool! I know that the Intel I210 and I211 have 4/4 and 2/2 queues (respectively), are those queues not yet supported by the em driver, or is this a different type of queue?

    Comments
    1. By sthen (sthen) on

      The em(4) driver doesn't support multiple queues yet.

      Comments
      1. By sthen (sthen) on

        AFAIK these are the drivers that already have some support for multiple queues: aq, bnxt, igc, ix, ixl, mcx, vmx.

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]