OpenBSD Journal

The f2k9 file system mini-hackathon

Contributed by johan on from the will-code-for-cache dept.

Here is the story about f2k9, the 2009 Filesystem mini-hackathon and some of the results of that week.

Janne Johansson (jj@) tells the story about how the f2k9 came to be and how it went down.

Over a year ago, art@ pointed me to the Internet Infrastructure Foundation who run the Swedish TLD, .se. They have something called the "Internet Fund" that channels excess money from the domain name registrations to projects in Sweden that in some way help further good internet usage.

After reviewing the older successful grants, I decided I had a chance to get one also. The only problem was that the final date was.. Tomorrow.

Please read on for the rest of the story:

I raced the clock in order to produce a decent application, and got very much help from both art@ and thib@ on what to set as the aim of my particular project, which ended up being "Improving OpenBSD for use as I/O intensive servers", since my personal hope was to have the OS use all/more of the free RAM for filesystem caches and generally be better used, now that all machines have insane amounts of memory.

After a lot of writing, I sent the submission and waited. It took a few months from that date until I got a phone call from Staffan, the main contact for this Fund, and just to be sure noone pulled the biggest prank ever on me, I tried to verify it with a callback, but this person was damned hard to reach from the switchboards, so I had to resort to faith on that. There was only one small issue, they didn't want a single person behind it, but rather a non-profit organization. Ok, back to writing more application forms and with the help of art@ and johan@ we sent the application to start "OpenBSD Sverige" to the tax authorities.

The only problem with that was that the application takes ages to get through. In the meantime, I thought the .se guys had forgotten about me, and a few months later, the result of "OpenBSD Sverige" came to my mailbox. It was succesfully registered. Except I was quite sure I'd missed the window of opportunity for the grant. Then a week later, someone else from the Foundation mailed me and asked "Did you ever get that org?" and I could start hoping again.

The grant application was written to cover travel and hotel expenses, and my work would host the event. I also decided to hold the 2009 Slackathon on the day after f2k9 was complete, so the visiting developers could hold talks on the topics they had been working on. This meant that I did not get a big bag of money which would have been nice, but rather the ability to have a travel agency and the hotel send their bills to the foundation and have it covered upto the sum of 150k SEK (something like $20k or 15k EUR).

After getting the final agreement of the grant, we started making a list of which people in the filesystem and VM-system area we could get hold of. It was also decided that the second week of August would be best, to have the most time available to test the changes that were done to the difficult parts of the kernel internals. Later on, Theo decided to cut the 4.6 release date one month short, in order to allow for even more testing of the f2k9 results to make 4.7 well tested.

One other issue I ran into was the fact that even though the hotel rooms were cheap, the hotel didn't want to bill a foundation, not even the one that run .se, so I had to cover that bill. Fortunately, I got my 2006 and 2007 Slackathon donations issues solved with Wim and I got that money refunded to me, which I have been using as a backing-store for this purpose.

Apart from all this, all the other pre-f2k9 arrangements went "ok". Some minor mishaps, lots of changed dates/flights etc but overall a decent ride to arrange something of this magnitude.
Then came that week.

We had a good list of people that could attend, totalling: Miod Vallat (miod@),Owain Ainsworth (oga@), Tobias Weingartner (weingart@), David Gwynne (dlg@), Kenneth Westerback (krw@), Ariane van der Steldt (ariane@), Mark Kettenis (kettenis@), Theo de Raadt (deraadt@), Bob Beck (beck@), Artur Grabowski (art@), Thordur I. Björnsson (thib@), Bret Lambert (blambert@), Henning Brauer (henning@), Claudio Jeker (claudio@), Peter Hessler (phessler@), Janne Johansson (jj@), Johan M:son Lindman (johan@) and Jasper Lievisse Adriaanse (jasper@)

With this crowd we could cover most of the people doing VM stuff, NFS and PCI/memory handling, as well as henning@ and claudio@ which were there to make sure all the crazy memory allocation and buffer handling changes would not hurt the networking parts.

As the week passed by, more and more of the stuff from the main hackathon in Canada that was reverted got added one at a time, with good test coverage for each change. Also, most evil code changes were accompanied by Bob yelling like a little girl about the state of the old code. Me and thib@ also reworked the lab I use at work where I host a bunch of openbsd machines so it would mimic the automounter/NFS setup used by Theo, with hosts of many different architectures looping over builds with serial consoles for most of them to help recover from the weirder crashes.

We started the week with a small BBQ at my home, somewhat like a small version of Bobs really huge BBQ in his back yard on the main hackathons. Apart from frying the usual dead vegetarians on the grill, we had some whale which thib@ had brought over from Iceland.

A note on the hotel (no, I wont link to them here) was that they had a really good photographer which puts real-estate agents to shame in the ability to make tiny rooms appear as dance halls. The photos shown did not really give the rooms justice, or they were taken somewhere else. The rooms we ended up getting had no windows, and were so small that after pushing two beds into the far end of the room, you couldn't even walk at the side of either the beds but had to crawl up from the short ends. There were also some mixups where the hotel receptionist would let one of the developers into some random girls room, waking her up to a rather unexpected surprise. The hotel was quickly named "The Prison", due to the general experience of being enclosed in a very small box.

The neat thing about getting a large bunch of developers close by is that you can get stuff fixed very fast. We had two issues at work, one OSPF issue when run on a 9k MTU network that claudio@ found and fixed in less than an hour and one sensor issue on a G5 macppc I run OpenBSD on where kettenis@ found and fixed it in less than a day after putting the machine next to his desk.

Another thing we did for my workplace was to ask henning@, claudio@ and dlg@ to "donate" an hour and have them take a look at our carp/vlan/ospf/pfsync dual-firewall/router setup and see if the way we do it is good or bad, since we reached that point by trial and error mostly, so we had no idea if that was the best way or not, just that it does work for us. In the end, our setup was done the right way, but at least we know now.

Finally, I had to write a report on the overall results from which I will copy some of the text, as a list of what each developer achieved during the week:


    miod@ worked on kernel debugging facilities, including a use-after-free checker for the kernel malloc(9) memory allocator. He also proposed a way to link the kernel so ddb(4) can list how structs are defined.


    art@ made more steps towards a coherent buffer cache (what other operating systems call unified buffer cache). The code is now working and needs just one or two more features and some tweaking before it becomes fully comittable. Parts of it have already been trickling down into the tree. Practical meaning is that mmaped files (including executables and shared libraries) aren't cached and copied twice, potentially speeding a lot of things up, but most importantly giving us semantics everyone has learned to depend on.


    krw@ and phessler@ worked on extending the OpenBSD support for the UDF filesystem. In particular they added support for the Metadata Partitions used by Blu-ray and HDDVD discs. As a result OpenBSD can now mount and read data from many if not all Blu-ray and HDDVD discs. They proved this by extracting and playing some content from the HDDVD version of "The Chronicles of Riddick".

In addition, krw@ reviewed a number of changes to the SCSI mid-layer that dlg@ is proposing, which will greatly simplify future support of SAN disks in particular.

Finally krw@ worked with miod@ on fixing USB keyboard usage on legacy-free machines such as the one our Blu-ray/HDDVD drive was in. These changes will enable use of USB keyboards in ddb(4).


    jasper@ was only there for three days, but still managed to add a 'show all bufs' to ddb(4) to show all the bufs in the system, did format string validation of daddr64_t in kernel sources and some cleanup of ntfs and msdos code


    ariane@ implemented guard pages in the kernel (needs some fixing, which is currently being worked on) and some of the queued changes include:

  • implemented physical memory specification for pools
  • (related to the above) implemented physical memory specification in most of the uvm memory allocators
  • implemented pool usage in malloc(9), giving a speed boost (unmeasured in the normal case; networking was slightly faster though)
  • formal introduction to pmemrange

    kettenis@ did:
  • pmap fixes for hppa and sh; only some of the hppa bits have been committed. This is very much related to the changes ariane@ has been working on since the current bugs prevent us from enabling pmemrange because of its fast page recycling effects.
  • isp(4) changes to use proper WWN's for Fibre Channel HBA's. Related to dlg@'s mpath(4) multipath work.
  • smu(4) fixes to support more sensors. Not related to filesystem or vm but the hardware was available.
  • i386/amd64 interrupt handling cleanups; will enable distribution of interrupts across CPUs in the future, which will improve I/O throughput. Parts committed.
  • scheduler changes to idle CPUs such that they can be halted.

He also helped me set up a Sun Fire 4810 the project received as a donation from BT Nordic, a 21U monster that makes more noise than half a decent serverroom would make on its own.


    beck@ fixed several bugs in the interaction between the new buffer cache code and the page daemon, meaning that low memory machines can now successfully work with bufcachepercent=90 without risk of hanging

He made the namecache in OpenBSD dynamically allocated - a big step towards having a larger name cache to make better use of buffer cache.

He also worked up two initial versions of diffs that make the vnodes in the kernel dynamically allocated. One will probably see the light of day shortly after he returns from Europe, this is not yet committed but most of the work was done in Stockholm.


    dlg@ worked on an mpath(4) driver that will take over units that are visible over two or more paths and present it as an mpath unit. He also did lots of scsi-midlayer work which resulted in vscsi(4), a virtual scsi device.


    claudio@ made an initial version of an iSCSI initiator on top of the vscsi(4) in something like three days, and showed it live on the 2009 slackathon. He also did routing domain additions to tun(4) and tcpbench(1).


    henning@ made a performance improvement to the ip_input code that handles the case where you may need to send an icmp error message back with parts of the original packet in it. He and claudio@ also helped test and benchmark all the changes that were done to the internal allocators.


    thib@ fixed a few NFS bugs, and also removed the NFSv2 write-gather code that isn't helping nowadays. He also worked on the error recovery code path for NFS renames, added a "show vnodes" to the ddb(4) and a command to list nfs nodes. He put a lock around the nfs nodelist to prevent issues with vnode-recycling on slow machines. Finally, he moved the filehandle and nodelookup lists to redblack trees and imported async NFS I/O code from Net/FreeBSD.


    blambert@ reworked the SysV message queues to be dynamically allocated instead of a static list allocated at boot time. He added code to assert rwlocks in the kernel, fixed a double-free of vnodes and some NFS code cleanup.


    oga@ moved the uvm object hashtable to a per-object tree right before f2k9, fixed some weirdness in the x86 bus_space_map that shows if you have code that does a lot of mappings. He also was very involved in the discussions about how to handle 32-bit devices on 64-bit machines.


    toby@ moved amd64 to use a linking script for the kernel. This should help in future using large pages for text/etc. Also, he added a new mtx_enter_try() function. We also exposed him to an intel EMT64 machine with 6G RAM and the oldest 32bit PCI card we could find in order to provoke it to fail when trying to do DMA.


    Theo did a lot of managing and coordinating, but still found time to figure out how to prevent ethernet cards from continuing DMA after the OS has rebooted, which results in severe problems. He also redid some of the packet locking handling so the machine would not spend lots of time getting and releasing the same lock over and over when receiving multiple packets in a queue. If I'm not mistaken, this single change did a 5% improvement of build times for "make build" when the source is mounted over NFS.

Actually writing the post-f2k9 report to the foundation took quite some effort, since these guys did produce a lot of changes and code and it was hard to summarize everything in an understandable manner.

It does show that small focused mini-hackathons can produce a lot of results if someone takes their time to host and arrange it. For all the people out there wondering how to contribute to the project if you can't code stuff yourself, this is definately one good way to do it.

And it is really fun when doing it, especially when you can top it off with a Slackathon conference afterwards, but that is another article for Undeadly.

I'll let you readers guess whom of the developers coined the phrase: "I owe my good looks to pig sperm", but there wont be a prize for the winner this time.

Please stay tuned, we will publish the slackathon story tomorrow.

(Comments are closed)


Comments
  1. By Anonymous Coward (anon) on

    Nice writeup.

    Comments
    1. By Anonymous Coward (anon) on

      > Nice writeup.

      Slackathon2009 pictures
      http://picasaweb.google.com/vladib/Slackathon2009

      Comments
      1. By Janne Johansson (jj) jj@inet6.se on .

        > > Nice writeup.
        >
        > Slackathon2009 pictures
        > http://picasaweb.google.com/vladib/Slackathon2009

        That link sort of relates to the more recent article now.

  2. By Simon (Simon_) on

    That's really cool to see, what comes out, if you put those devs in one room ;-)

    Keep on donating, so more magic like this can happen!

  3. By Tim (tim) tim@nop.cx on

    Very nice summary.
    Thanks for all the effort on getting that application to the foundation.

  4. By Joachim Schipper (Joachim) j.schipper@math.uu.nl on http://www.joachimschipper.nl

    This is truly awesome. It would be really great to see all that stuff committed: better I/O performance is very, very welcome. (And better NFS code is also very, very welcome!)

  5. By Denis (Denis) openbsd@ledeuns.net on

    It sounds very promising :) I'm looking forward to see more soon.

  6. By Frank Denis (jedisct1) f@orbus.fr on http://00f.net

    Excellent writeup.

    Yeah, finally mmap() and read() will play nice, that's great.

  7. By Dan Naumov (jago) dan.naumov@gmail.com on

    This is confusing, from the writeup of a "file system mini-hackathon", I am seeing very little changes actually relevant to filesystems. Any plans for journaling? ZFS? NTFS read/write?

    Comments
    1. By thib (thib) on http://www.openbsd.org

      > This is confusing, from the writeup of a "file system mini-hackathon", I am seeing very little changes actually relevant to filesystems. Any plans for journaling? ZFS? NTFS read/write?

      ZFS will never happen, for very obvious reasons (CDDL).

      If someone wants to work on NTFS, sure. I dont. I don't use NTFS.

      What about journaling... ?

      Then again, alot of the work at f2k9 was midlayer/VFS which
      is very relevant to filesystems, and I think its way more
      important to get the midlayer/vfs in better shape before starting
      to add more cowbell to the "filesystems"...

      Anyways, 'k2k9' would have been a better name :-)

      Comments
      1. By Dan Naumov (jago) on

        > ZFS will never happen, for very obvious reasons (CDDL).

        Why is this not a problem for FreeBSD and NetBSD?

        Comments
        1. By Bryan Brake (brakeb) on

          > > ZFS will never happen, for very obvious reasons (CDDL).
          >
          > Why is this not a problem for FreeBSD and NetBSD?

          For the same reason that they allow Nvidia BLOBS in... or they signed an NDA to allow CDDL crap in their OS...

        2. By thib (thib) on http://www.openbsd.org

          > > ZFS will never happen, for very obvious reasons (CDDL).
          >
          > Why is this not a problem for FreeBSD and NetBSD?
          Because they don't care about "freedom" the way we do.

        3. By Magic carpet (bodie) on http://www.opensolaris.org

          > > ZFS will never happen, for very obvious reasons (CDDL).
          >
          > Why is this not a problem for FreeBSD and NetBSD?

          You must ask them.But when I use FreeBSD then there is everytime some bug.NetBSD is not so bad in this,but I haven't similar problems with OpenBSD it's more stable and bug free then any other OS even when I'm running -current.The most funny thing for me is that OpenBSD in -current is more stable and bug free then OS's like Linux,Windows,BSD or Solaris in their stable versions.I use all of them because of my work or interest,but OpenBSD is and will be number one.

    2. By phessler (phessler) on http://theapt.org

      > This is confusing, from the writeup of a "file system mini-hackathon", I am seeing very little changes actually relevant to filesystems.

      You'd be surprised at how much stuff is actually related to filesystems. 'buffer cache', 'vnodes', 'scsi', 'uvm', 'nfs' are all deeply involved in FS. Lots of nasty little tendrils and cross-pollution.

    3. By Magic carpet (bodie) on http://www.opensolaris.org

      > This is confusing, from the writeup of a "file system mini-hackathon", I am seeing very little changes actually relevant to filesystems. Any plans for journaling? ZFS? NTFS read/write?

      ZFS? The best system for ZFS is Solaris/OpenSolaris.They have some very good features which aren't available on other OS's,but licence is not good.Some OS use ZFS,but it's not 100% useful.If you want ZFS then use it in internal network.On perimeter you can use OpenBSD for protection.

      Or you can use free alternative for ZFS -> Hammer FS from DragonflyBSD.

      OpenBSD use different approach then journaling http://www.openbsd.org/faq/faq14.html#SoftUpdates .But as I know from posts and mails there are maybe some plans for future about it.


      And for NTFS there is a support http://www.openbsd.org/cgi-bin/man.cgi?query=mount_ntfs&sektion=8&apropos=0&manpath=OpenBSD+Current&arch=i386 .Of course that it's limited,but at least something and I think that it's enough for situations you can find in real use.If you really need correct something on server/laptop/desktop in NTFS why don't use some live CD of Linux or recovery CD for Windows instead of OpenBSD?

    4. By Amarendra Godbole (amunix) on

      > This is confusing, from the writeup of a "file system mini-hackathon", I am seeing very little changes actually relevant to filesystems. Any plans for journaling? ZFS? NTFS read/write?

      For a start, why don't you get a filesystem book and read some stuff, before you ask a question?

      -amarendra

  8. By Anonymous Coward (mehh) on

    Thanks for the article and taking care of all that work to make that hackathon happen!!

    Taking the kernel compile time from 7 down to 6 minutes after the hackathon is kind of impressiv!

    Cool stuff!

  9. By Bryan Brake (brakeb) brakeb@gmail.com on

    This is also a good example of getting something you want in your favorite OS. True, it did cost a little bit, but the donation was worth it. And now we can mount and extract data from HD-DVD and Blu-ray discs natively...

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]