OpenBSD Journal

Developer Blog: ratchov@'s recent audio work

Contributed by ray on from the secure-audio-system dept.

Alexandre Ratchov (ratchov@) and Jacob Meuser (jakemsr@) have been putting a lot of hard work into improving OpenBSD's audio system. Alexandre was kind enough to write in a summary of their work:
We just committed the code of an audio server and a general sound access library usable in audio ports. Here is how and why this happened.

Certain audio applications do not work properly because they support none of the encodings or the sample frequencies of your audio(4) device. For instance, if you have a fixed rate ac97(4) codec supporting only 48kHz sample frequency and you try to use mpg321 or ogg123, you notice that they play too fast. With envy(4) devices the situation is even worse, since the device supports only 10/12-channel and 32-bit encoding, roughly no audio application is usable with it.

To make my hardware usable I first thought that the easiest would be to add the missing conversion code in the kernel, so the device can appear as a soundblaster-like card and any application can use it.

However this is not satisfying: the user must be able to tell which of the 10-channels must be mapped to the stereo device, and the conversion code is not simple because there are lots of possible configurations.

Furthermore such audio conversions are CPU intensive, and I dislike having CPU intensive code in kernel. The reason is that while a process is running in kernel mode, it cannot be preempted, which means that an audio application may hog the CPU and delay the execution of any time sensitive application. For instance that would penalize real-time MIDI applications, and I use a lot of MIDI. So I convinced myself that putting conversion code in kernel was not the way to go.

At the same time, I wrote a small utility to play and record audio in full-duplex doing any format conversion, resampling, mixing and demultiplexing on the fly with any number of audio streams. Initially it was for my own use, but finally we committed it as aucat(1) few months ago.

At this stage, It was possible to do primitive multitracking with aucat(1), or to use simple pipes like the following:

ogg123 -o - mymusic.ogg |aucat -i -
I realized that if we can make apps use directly aucat(1) rather than setting up pipes by hand, most of the conversion problems would be solved, and since aucat(1) supports any number of streams, as a side effect, this would allow multiple applications to share the same audio device. Roughly that's what an audio server does.

All ingredients required for an audio server were there, except the communication layer. So i wrote a small add-on to aucat(1) to make it listen on an unix socket and dynamically attach streams to its mixer demultiplexer.

To expose it to applications we added a simple "sound I/O" library, libsndio. If the aucat(1) server isn't running, it just uses the usual audio(4) device, so everything should just work. We locally updated few ports to use the new API and we observed it's pretty easy to use. The API is simple, there's only one way to do one thing, there are virtually no knobs, so less chances to make a mistake.

So, what's next? First, we'll continue updating ports to use libsndio rather than the kernel API. The aucat(1) conversion code is somewhat primitive and should be optimized for both quality and speed. I'd like to keep this stuff minimalistic, so I plan to work on improving quality and robustness rather than adding new features.

Thanks to Alexandre and Jacob for all their hard work!

Editor's note: Jacob also posted this note to the ports@ list. It includes a diff to make aucat(1) the new default sound server for SDL. Jacob already reports improvements compared to the older audio(4) backend. As always, testing by porters and users is needed.

(Comments are closed)


Comments
  1. By m (87.144.87.63) on

    You don't know, how happy I am to hear this! Thank you so much!

    One Question though: wouldn't it generate less work to redirect connections to the audio device through aucat?
    (I think I saw this suggestion on misc@ before..)
    I don't know if that's a possible, feasible and safe solution, but updating "all" ports to use aucat sounds like an aweful lot of work..
    -m

    Comments
    1. By Alexandre Ratchov (129.184.84.11) on

      > One Question though: wouldn't it generate less work to redirect
      > connections to the audio device through aucat?
      > (I think I saw this suggestion on misc@ before..)
      > I don't know if that's a possible, feasible and safe solution,
      > but updating "all" ports to use aucat sounds like an aweful
      > lot of work..

      well, we never know in advance. I believe that it's less work. Sure, in the short term it's a lot of work: we have to design/write a lot of code and update a lot of apps to use it. But in the long term we'll spend less time on tweaking apps and implementing workarounds and thus we'll have more time solve real problems.

    2. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

      > You don't know, how happy I am to hear this! Thank you so much!
      >
      > One Question though: wouldn't it generate less work to redirect connections to the audio device through aucat?
      > (I think I saw this suggestion on misc@ before..)
      > I don't know if that's a possible, feasible and safe solution, but updating "all" ports to use aucat sounds like an aweful lot of work..
      > -m

      it's been an awful lot of work to make even audio(4) work as it
      should, not to mention ossaudio(3).

      and still, there are many patches in ports for audio stuff because
      quite franlky there are no good audio APIs.

      OSS has different implementations. there are subtle differences in the implementations (just dig through some OSS audio backend's history). so there is a lot of OSS code that may work just fine on FreeBSD or linux, but doesn't quite work on OpenBSD. ossaudio(3) is never going to be a full implementation. it's a hack that mostly works.

      Sun audio is dying. NetBSD seems to prefer OSS in pkgsrc, and I beleive Sun is moving to OSS. and even code that was originally written for Sun audio might not work on OpenBSD either. for example, many such audio backends use the "samples" field of struct audio_prinfo, which is a sample count on Sun and a byte count on *BSD. another sad note, there have been Sun audio backends written for *BSD that are either wrong or exceedingly lacking due to issues with audio(4) at the time.

      the libsndio API, on the other hand, is quite easy to use, and quite powerful. and it can be ported to any underlying audio API, be it OSSv3 or OSSv4 or ALSA or ...

      just compare the backends I sent to ports@ with the ones they replace and ask yourself which looks easier to understand and maintain.


  2. By David Chisnall (137.44.2.39) on

    We already have a nice, easy to use, sound API called OSS. It works on FreeBSD, Solaris, HP-UX, and sometimes even Linux. Linux has ALSA, which is a pain to program with, but can emulate OSS. OpenBSD has OSS emulation.

    Even if it's implemented using userspace mixing and format conversion (which, from a stability perspective is nice, even if it's less than ideal from a latency perspective), please support the OSS APIs. Otherwise it's going to make porting audio code to OpenBSD even harder.

    I don't believe the aucat solution can implement the ioctls that OSS requires without some support from the kernel. The latest audio work in FreeBSD is really nice - a full implementation of the OSS 4 APIs with an arbitrary number of virtual channels, per-vchan formats and volume controls, and low-latency mixing. They've put a lot more in the kernel than I'm entirely happy with, but from the perspective of a user and a userspace developer it works very well.

    Comments
    1. By Alexandre Ratchov (2001:7a8:4e69::2) on

      > We already have a nice, easy to use, sound API called OSS.
      >

      oh, the lib is much simpler and easier to use, you should check the documentation and/or the sources.

      > It works on FreeBSD, Solaris, HP-UX, and sometimes even Linux.
      > Linux has ALSA, which is a pain to program with, but can emulate OSS.
      > OpenBSD has OSS emulation.

      We haven't removed the oss emulation, oss emulation is still there
      and you still can use it.

      >
      > please support the OSS APIs. Otherwise it's going
      > to make porting audio code to OpenBSD even harder.
      >

      I don't think so, writing few lines of C to get make an audio app
      reliably work is easier than tweaking OSS bits to make the app work.

      >
      > I don't believe the aucat solution can implement the ioctls
      > that OSS requires without some support from the kernel.

      sure

      > The latest audio work in FreeBSD is really nice - a full
      > implementation of the OSS 4 APIs with an arbitrary number
      > of virtual channels, per-vchan formats and volume controls,
      > and low-latency mixing. They've put a lot more in the kernel
      > than I'm entirely happy with, but from the perspective of a
      > user and a userspace developer it works very well.

      i'm very happy for freebsd users, really.

    2. By Anonymous Coward (128.171.90.200) on

      Unfortunatly, it looks like OSS is not very OpenBSD friendly ...

      Q: Is everything in OSS open sourced?
      A: No. There are three drivers that have mosty been written by the hardware manufacturers. We are not permitted to release their sources unless their authors as us to do so.
      Also some parts of the envy24 and envy24ht drivers contain some code written under NDA. We have not yet received the approval to open source the code from all manufacturers. So these drivers cannot be open sourced just now.
      If we don't get the approval in reasonable time we will distribute these drivers with the offending code stripped from the sources.
      Finally there are some effects in the old softoss driver that are not included in the source packages. We will make the decision about their future later. At this moment it looks like we will remove the softoss driver from the OSS package so these effects will not be used in OSS anyway.
      We reserve the right to include some “closed source” drivers only in our binary distribution if the hardware manufacturers refuse to give the programming specs without NDA. Our policy is to promote open source but not to enforce it. We will let hardware manufacturers to decide if they like to select the commercial distribution mode instead of the open source one with much wider customer base.

      http://developer.opensound.com/opensource_oss/licensing.html

      Comments
      1. By Anonymous Coward (79.183.162.168) on

        That information is a bit outdated - only the Lynx driver has remained closed. All the rest is available under BSD/CDDL/GPLv2 (compare http://mercurial.opensound.com/?file/tip/ vs the driver list). In any event, support for the OSS API does not imply taking in the drivers as well.

        Comments
        1. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

          > That information is a bit outdated - only the Lynx driver has remained closed. All the rest is available under BSD/CDDL/GPLv2

          imo, the OSS licensing scheme is broken. why three different "open" licenses? very confusing. it even says on their licensing page that one cannot mix files from one licensed version to another. I understand they are trying to please everyone, but this is not the right way.

          Comments
          1. By Anonymous Coward (79.183.162.168) on

            > > That information is a bit outdated - only the Lynx driver has remained closed. All the rest is available under BSD/CDDL/GPLv2
            >
            > imo, the OSS licensing scheme is broken. why three different "open" licenses? very confusing. it even says on their licensing page that one cannot mix files from one licensed version to another. I understand they are trying to please everyone, but this is not the right way.
            >

            Well, Sun probably insisted on CDDL when they signed a contract with 4front and the *BSD people wanted BSD license, and a module written for Linux without GPL would have legal problems. So I'm not surprised 4front ended with all the above. In any event, you can't mix files from different licenses in other software either. GPL "pollutes" BSD and conflicts with CDDL. CDDL/BSD combinations are frowned upon by people who aren't FreeBSD. Having three separate versions is a bit easier than getting into the dual/triple license mess (If someone modifies the code, does he have to use all three licenses, or only one? Can he fork by using only one license? If not, does that mean that GPL copyleft always applies regardless of BSD terms? etc.).

            What happens in OSS is that all people who send significant code sign a contributor agreement (linked at the developer webpage) which assigns the copyright to 4front. The agreement compels 4front to have the code licensed under the BSD/CDDL/GPLv2 licenses.

    3. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

      > We already have a nice, easy to use, sound API called OSS. It works on FreeBSD, Solaris, HP-UX, and sometimes even Linux. Linux has ALSA, which is a pain to program with, but can emulate OSS. OpenBSD has OSS emulation.

      there's no reason libsndio can't be ported to these and more. i mean, just one backend would cover all those, right?

      ossaudio(3) is quite the hack.

      > Even if it's implemented using userspace mixing and format conversion (which, from a stability perspective is nice, even if it's less than ideal from a latency perspective)

      you have real world benchmarks to back up this "less than ideal from a latency perspective"?

      tell me this, if mixing is done in kernel, how does this *not* create more latency in the kernel? how can a soundcard really support multiple sample rates if mixing is done in the kernel? wouldn't in-kernel resampling and format conversions be *necessary* to have kernel mixing? would there not be a rather significant change in what the kernel is doing when one application is using the audio device as opposed to two if the two applications are sending differently encoded data?

      why do you think "linux audio developers" use jack?

      Comments
      1. By Anonymous Coward (79.183.162.168) on

        > > We already have a nice, easy to use, sound API called OSS. It works on FreeBSD, Solaris, HP-UX, and sometimes even Linux. Linux has ALSA, which is a pain to program with, but can emulate OSS. OpenBSD has OSS emulation.
        >
        > there's no reason libsndio can't be ported to these and more. i mean, just one backend would cover all those, right?

        I think the intention of the OP was to have a single sound API (he suggested OSS API) supported across Unixs so that a programmer wouldn't have to write multiple backends. Having libsndio converted to be yet another portable sound library (like libao, portaudio, SDL, etc.) doesn't fix that. Anyway, I think the better question is if it's technically possible to make OSS using apps use the mixing too without adding another sound backend (assuming, for example, that someone converts ossaudio to use libsndio).

        Comments
        1. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

          > > > We already have a nice, easy to use, sound API called OSS. It works on FreeBSD, Solaris, HP-UX, and sometimes even Linux. Linux has ALSA, which is a pain to program with, but can emulate OSS. OpenBSD has OSS emulation.
          > >
          > > there's no reason libsndio can't be ported to these and more. i mean, just one backend would cover all those, right?
          >
          > I think the intention of the OP was to have a single sound API (he suggested OSS API) supported across Unixs so that a programmer wouldn't have to write multiple backends. Having libsndio converted to be yet another portable sound library (like libao, portaudio, SDL, etc.) doesn't fix that.

          I fully understand that the OP is wanting a single audio API everywhere. I want that too.

          libao is too wonky; configuration files, lots of "choices", etc. portaudio is rather complex, far more than what most people need
          (and it has very few, if any, advantages over libsndio, imo). SDL is also video, joysticks, etc and doesn't do recording.

          emulating an ioctl based API, like OSS, is much more difficult than using a library. certainly switching the underlying audio system to OSS is much more work than writing a libsndio backend, no?

          basically, I think libsndio is a best choice for the single audio API everywhere.

          > Anyway, I think the better question is if it's technically possible to make OSS using apps use the mixing too without adding another sound backend (assuming, for example, that someone converts ossaudio to use libsndio).
          >

          probably not. certainly nothing I would want to attempt or maintain. OSS is an ioctl based API. just look at the libossaudio sources.

  3. By deprived dude (80.249.194.29) on

    Does this mean that I can finally chat in Skype AND listen to music at the same time? (No "audio device is busy" messages anymore). Wow, thanks! 2008, here I come!

    Comments
    1. By Anonymous Coward (129.222.50.21) on

      > Does this mean that I can finally chat in Skype AND listen to music at the same time? (No "audio device is busy" messages anymore). Wow, thanks! 2008, here I come!

      wow, Skype in OpenBSD? Tell us how!

      Comments
      1. By deprived dude (80.249.194.29) on

        > wow, Skype in OpenBSD? Tell us how!

        http://www.mail-archive.com/misc@openbsd.org/msg48111.html

    2. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

      > Does this mean that I can finally chat in Skype AND listen to music at the same time? (No "audio device is busy" messages anymore). Wow, thanks! 2008, here I come!

      well, I do have a backend for ekiga mostly ready.

      Comments
      1. By Anonymous Coward (70.173.233.82) on

        > > Does this mean that I can finally chat in Skype AND listen to music at the same time? (No "audio device is busy" messages anymore). Wow, thanks! 2008, here I come!
        >
        > well, I do have a backend for ekiga mostly ready.
        >

        That, an xmms-libsndio plugin and a libsndio backend for esd would cover about all my sound daemon needs.

        wouldn't hurt to have ekiga and pw^Htlib updated to newer versions though (too many @ekiga.net numbers don't work with the port version, unfortunately).

  4. By Anonymous Coward (212.20.215.132) on

    This looks real good! As a user, if I want multiple audio applications to use the audio device simultaneously, I only have to start the aucat server (for applications using libsndio, of course). As a developer, I just call sio_open and it'll either connect to the aucat server or audio(4); transparently.

    This is exactly the simplicity and elegant design I've come to expect from OpenBSD. A big thank you to the developers. Keep up the good work!

  5. By TeXitoi (193.52.105.131) on

    How does it compare to libao and esound?

    Comments
    1. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

      > How does it compare to libao and esound?

      easier to use. no configuration files.

      Comments
      1. By Anonymous Coward (79.183.162.168) on

        > > How does it compare to libao and esound?
        >
        > easier to use. no configuration files.
        >

        libao's configuration file only has one setting: to set the backend used (and that's optional). If/when libsndio grows another backend, it will probably end up with a similar config file... The API is also very easy to use. Its real problem is that the API is also very very limited. Also, the quality of the plugins is... inconsistent, to the point that some require different input formats.

        Comments
        1. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

          > > > How does it compare to libao and esound?
          > >
          > > easier to use. no configuration files.
          > >
          >
          > libao's configuration file only has one setting: to set the backend used (and that's optional). If/when libsndio grows another backend, it will probably end up with a similar config file...

          I hope not. I think libsndio should have backends for the (most) native audio API. libsndio should be as close to the hardware as possible. so, there is no choice.

          > The API is also very easy to use. Its real problem is that the API is also very very limited.

          and that's exactly where libsndio shines. easy AND powerful.

          > Also, the quality of the plugins is... inconsistent, to the point that some require different input formats.

          libsndio is very useful here as well. see sio_getcap, for example.

          Comments
          1. By Anonymous Coward (79.183.162.168) on

            > > > > How does it compare to libao and esound?
            > > >
            > > > easier to use. no configuration files.
            > > >
            > >
            > > libao's configuration file only has one setting: to set the backend used (and that's optional). If/when libsndio grows another backend, it will probably end up with a similar config file...
            >
            > I hope not. I think libsndio should have backends for the (most) native audio API. libsndio should be as close to the hardware as possible. so, there is no choice.

            Well, there are some use cases which may require a choice:
            A) In order for a libsndio based program to use networked audio, there would be a need for a non-native sound backend (nas, esd, pulse, arts, etc.), since none of the native backends support this. This, of course, inserts a choice which would require a config file or similar.
            B) libsndio on Linux should default to ALSA. But all Linux emulations I know of (including OpenBSD's) only support OSS API, so leaving no way to set the backend to OSS on Linux would create a problem for them.

            Comments
            1. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

              > > > > > How does it compare to libao and esound?
              > > > >
              > > > > easier to use. no configuration files.
              > > > >
              > > >
              > > > libao's configuration file only has one setting: to set the backend used (and that's optional). If/when libsndio grows another backend, it will probably end up with a similar config file...
              > >
              > > I hope not. I think libsndio should have backends for the (most) native audio API. libsndio should be as close to the hardware as possible. so, there is no choice.
              >
              > Well, there are some use cases which may require a choice:
              > A) In order for a libsndio based program to use networked audio, there would be a need for a non-native sound backend (nas, esd, pulse, arts, etc.), since none of the native backends support this. This, of course, inserts a choice which would require a config file or similar.

              or network connectivity could be added to libsndio/aucat.

              > B) libsndio on Linux should default to ALSA. But all Linux emulations I know of (including OpenBSD's) only support OSS API, so leaving no way to set the backend to OSS on Linux would create a problem for them.

              it would be a build time option. this is what we usually end up with, having to choose a binary built for OSS instead of ALSA. I really don't see the point of adding a config file to our native audio system because maybe some day we'll want to run a binary for another OS.

            2. By tedu (udet) on


              > B) libsndio on Linux should default to ALSA. But all Linux emulations I know of (including OpenBSD's) only support OSS API, so leaving no way to set the backend to OSS on Linux would create a problem for them.

              This is so convoluted I had to read it about 5 times. If we are somehow in the position that OpenBSD's sound API becomes so popular that Linux binaries are using it, we'll figure something out when it happens.

              Comments
              1. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

                >
                > > B) libsndio on Linux should default to ALSA. But all Linux emulations I know of (including OpenBSD's) only support OSS API, so leaving no way to set the backend to OSS on Linux would create a problem for them.
                >
                > This is so convoluted I had to read it about 5 times. If we are somehow in the position that OpenBSD's sound API becomes so popular that Linux binaries are using it, we'll figure something out when it happens.

                no need to. we'd only need to use aucat or use a libsndio package that was built for linux/OSS. it would actually make things a lot easier.

            3. By Alexandre Ratchov (129.184.84.11) on

              > Well, there are some use cases which may require a choice:
              > A) In order for a libsndio based program to use networked audio,
              > there would be a need for a non-native sound backend (nas, esd,
              > pulse, arts, etc.), since none of the native backends support this.
              > This, of course, inserts a choice which would require a config
              > file or similar.

              libsndio provides a device-like interface. It will never work with something like esd because it lacks the necessary synchronization primitives.

              We're not trying to support everything, we're trying to make apps just work reliably on OpenBSD with any hardware, the purpose of this stuff is to improve quality. Supporting simultaneously many backends doesn't improve quality; having knobs, options, config files degrades quality and adds confusion.

              > B) libsndio on Linux should default to ALSA.

              sure; and it would be nice to have libsndio on linux, so linux audio
              developpers can quickly test diffs porters are sending them.

              > But all Linux emulations I know of (including OpenBSD's)
              > only support OSS API, so leaving no way to set the backend
              > to OSS on Linux would create a problem for them.

              what's the point in using a Linux emulator to run an app using libsndio (ie OpenBSD native API) ?

    2. By Alexandre Ratchov (129.184.84.11) on

      > How does it compare to libao and esound?

      libao and libsndio don't have not the same goals, libao it's not a device access API. It is play-only, and lacks any synchronization mechanism, so it's not usable for instance to sync audio/video. It's modular, and can for instance play to a .wav file.

      libsndio is a device access API, it's full-duplex, has a percise synchronization mechanism but cannot be used to play to a .wav file.

      esound design seems orientated for desktop environments: you can upload short wave files into the server and then call them (handy for desktop bells), you can monitor the the output, aucat(1) can't do that. AFAIK you cannot reliably synchronize playback to record, so it's not usable for multitracking; it cannot recover from overruns/underruns (xruns). It doesn't support encodings of "pro" cards.

      aucat(1) tries to present a device-like interface. Playback and record are always synchronized. It recovers from xruns at the client level (ie when one client xruns) and at the device level (ie when aucat ifself xruns). So on a system with a short load burst, the sound will be degraded during the burst, but once the burst is over aucat ``catches up'' the sync. These features make it suitable for multitracking. Furthermore it supports encodings of pro cards. The API is well documented.

      Comments
      1. By Anonymous Coward (89.103.127.149) on

        > esound design seems orientated for desktop environments: you can upload short wave files into the server and then call them (handy for desktop bells), you can monitor the the output, aucat(1) can't do that. AFAIK you cannot reliably synchronize playback to record, so it's not usable for multitracking; it cannot recover from overruns/underruns (xruns). It doesn't support encodings of "pro" cards.
        >
        > aucat(1) tries to present a device-like interface. Playback and record are always synchronized. It recovers from xruns at the client level (ie when one client xruns) and at the device level (ie when aucat ifself xruns). So on a system with a short load burst, the sound will be degraded during the burst, but once the burst is over aucat ``catches up'' the sync. These features make it suitable for multitracking. Furthermore it supports encodings of pro cards. The API is well documented.

        hi,

        what about pulse and openbsd?

        Comments
        1. By Anonymous Coward (212.27.60.48) on

          > hi,
          >
          > what about pulse and openbsd?

          AFAIK, pulse is GPL'd and thus not eligible for base.

          Comments
          1. By Jacob Meuser (24.22.108.209) jakemsr@sdf.lonestar.org on

            > > hi, > > > > what about pulse and openbsd? > > AFAIK, pulse is GPL'd and thus not eligible for base. also, myself and at least 2 other people have spent considerable time trying to get pulse working on OpenBSD. I sorta got it working with some ugly hacks, but it was still constantly crashing. sorry, but pulse is just another over-engineered-yet-(probably-)unintentionally-linux-centric pile of poo. neat ideas, poor implementations.

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]