Contributed by tbert on from the hanging-by-one dept.
Eric Faurot(eric@) takes on an ancient resolver:
I spent my week at r2k12 waiting for ajacoutot@ to leave before importing the async resolver I have been working on :) More seriously, I was a bit surprised to receive an invitation for this rthreads hackathon, but I thought it would be a good occasion to take a week off real-life and make things move forward on something totally unrelated. Well, not totally.
Maybe a bit of context first. Using the blocking resolver(3) functions in async programs has always been a problem, and I had written from scratch a simple set of functions to make async DNS queries easily. It was first imported as part of OpenSMTPD (smtpd(8)), which is a big DNS user, as a replacement for a hack where queries were delegated to forked processes. It helped solve performance and resource issues.
That code was not intended to be OpenSMTPD-specific and we wanted to move it to a new home, where other programs could start to make use of it too; the question was how to do it. With input from Miod(miod@) and Theo(deraadt@), it became obvious that the only sensible way to go was to put it right in libc, as a replacement for the existing resolver, otherwise it would be a dead-end. So I looked at what it would take to go down that road, changed the API once again, and started to rewrite most of the code.
So, back to r2k12. I can't really think of specific points worth detailing, as for me it's been more like the continuation of a long process. But my goal for this week was to put things in order and prepare for integration in base. So I warmed up by wasting more time trying to make some sense out of the getnetby<whatever>() set of functions, which I now consider hopelessly broken. Then I continued to further polish things, reducing the number of symbols, cleaning up the code some more, in preparation for the initial import, which is now done.
It's not the end of the road yet, but it's a very important step, as it will allow smoother integration and bug fixing. Working on big things outside the tree is not easy because you are all alone, you can't expect people to look at or test things. Now that everything is in the tree, it will simplify the development process and allow interested people to track it or test diffs.
Oh, there was supposed to be some relation to threads... Well, the current resolver doesn't play nice with threads at all: the getaddrinfo() implementation for example has a big thread lock around it, so it means that when firefox uses a threadpool to resolve hostnames, it gains nothing. The intended replacement should perform much better in this regard.
Paul Irofti(pirofti@) documents his code slavery:
I wanted to help out with rthreads, making sure they are enabled for the next release, so I decided to join the r2k12 hackathon and become guenther@'s personal slave for a week.
Each night I'd ask him for a new task which I tried to finish within a day so that something else could be moved forward. Of course, these tasks were pretty small and simple but annoying enough that the big chief wouldn't want to touch them.
I've started by implementing the pthread_barrier_wait(3) family of functions. These are intended to stop threads at a barrier until a given number of threads reach that point. The implementation is basically a simple wrapper around a mutex and a condition variable.
My next task was adding per-thread usage accounting, which is a neat feature for debugging. This turned out to be more complicated than it seemed due to backward- and forward-compatibility issues. The main idea was to get an extra parameter to FILL_KPROC that indicates if the caller is a process or a thread and depending on that the proper usage times were filled in. After the kernel bits were done they needed to be mirrored in libkvm so that when the -H flag was used for ps(1) or top(1) you would get the threads stats as well. So now in -current, when calling kvm_getprocs(3), you can include KERN_PROC_SHOW_THREADS in the flags and get the accounting numbers. By default this flag is off so that old binaries and code still work as expected.
Another POSIX API that I implemented was pthread_spin_lock(3). The idea being that a calling thread tries to get a lock and if it's not for grabs it spins until it becomes available. The diff hasn't been committed yet, as I'm still waiting for okays, but one can find it on tech@. It seems vlc is happy with it, but feel free to poke it some more and report back.
Next task was moving the systrace pointer to the process structure, as it's currently handled per-thread. The diff is not yet finished as the task turned out to be a bit more complicated than what I thought and time was short.
At one point during the hackathon I also removed pcc(1) from base. It was outdated, unmaintained and thus pretty useless. I do hope that one day it can be brought back from the attic and used for our builds, but until then a better place for it is probably ports.
As always, the hackathon hasn't been about how much code we get in but about passing on ideas and planing out future projects. I did more listening than passing on as I was just beginning to touch this subsystem. Nonetheless, it has been a great experience as now I can look at a new part of the tree and hack away with a lot more confidence than before.
Antoine Jacoutot(ajacoutot@) catches some mutex bugs:
I was pleased and surprised to learn there was going to be such an event in walking distance from my place. When I was made aware of this hackathon, most "big" things were already taken care of which surprised me even more... I guess one should never trust a French developer to help organise anything... I find this both outrageous and right ;-)
Anyway, my goal for this week was to intensively test desktop-related things with rthreads and whine^complain to guenther@ and kurt@ when something was not working properly. As usual I got into updating/fixing several parts of the GNOME Desktop but that is boring so I won't expand on it.
The most challenging issue of the week was the hunting of several crashes I was seeing with some gtk+2 applications. With a _huge_ help from kurt@ (which made me realise how weak my knowledge of threads is) we were able to figure out the reason for all these crashes: recently, glib changed the way it deals with mutexes and does not ignore unlocking errors anymore (e.g. when trying to unlock a nonexistent mutex or one that is not our own); but relies on the operating system default mutex type. Unlike Linux which uses a default mutex type of PTHREAD_MUTEX_NORMAL (that leaves these kind of errors go unnoticed), OpenBSD uses a default of PTHREAD_MUTEX_STRICT_NP which will make the application abort(3) when such an event happens. So this is an application issue that Linux people never see because of their default mutex type. So again, by choosing correctness over raw speed, OpenBSD was able to catch bugs that no one else has seen yet although they are very real. Work is still ongoing to make sure applications are fixed properly but nothing has been committed yet.
There are still some threads-related problems to take care of on the "monster" desktops side but a _lot_ of things work much better than they used to.
I just wanted to finish by saying kudos kurt@ and guenther@ for the fantastic reactivity about reported threads issues; thanks guys!
Stay tuned for our fourth and final installment!
(Comments are closed)
By Paul Irofti (bulibuta) on gopher://sdf.lonestar.org/1/users/bulibuta