OpenBSD Journal

Philip Guenther: t2k13: guenther's tale

Contributed by pitrh on from the time and threads unwinding dept.

Here is, as promised, the next installment in our series of t2k13 hackathon reports. This one comes from Philip Guenther (guenther@), who writes:

t2k13 had three phases for me:

1) time_t

As many of know, UNIX represents timestamps as a count of seconds since 1970-01-01 00:00:00 UTC. That's what is stored in the time_t type which is used in many kernel and library APIs, either directly or as part of many structures such as timeval, timespec, and stat.

Currently, OpenBSD defines time_t to be a signed 32bit type. Well, the maximum positive signed 32bit number is only 2147483647, which as a time_t represents 2038-01-19 03:14:07 UTC. That's less than 25 years away, so it's certainly time to get cracking on fixing it.


20 years ago, the UNIX world was facing a similar problem: the off_t type for holding file sizes and offsets was only 32bits, but files were approaching 2GB, the limit for that type. What should they do? At Sun and other vendors, backwards compatibility was deemed paramount and in what was called the Large File Summit they developed a whole new set of 64bit types and functions for working on large files, with a new error code for when programs using the 32bit calls encounter a large file or filesystem. It was decided that it was okay for some programs to simply fail if hit a large file or filesystem. That remains true to this day.

This may seem trivial, particularly since it just takes some preprocessor options to switch to the 64bit ABI. And yet, many programs and libraries continue to use the old ABI. Indeed, glibc itself will *still* use the 32bit ABI internally and some calls like pathconf() that should always be safe will return errors on large filesystems. Fail!

In contrast, at Berkeley the CSRG gritted their teeth, threw away the old ABI and converted BSD to use a 64bit type for off_t; that work was released 19 years ago as 4.4 BSD and was soon merged into the BSD projects. Ever since then, no program on FreeBSD, NetBSD, or OpenBSD has had to worry about using the correct ABI for large files. They have to use types correctly, certainly, but the trap of possibly using the wrong ABI was avoided.

So, it's now 20 years later and we face the same situation with time_t; are we as brave as the CSRG was then to bite the bullet, break the ABI, and eliminate ABIs that will *fail* in 2038? Yes!

Last August, I started developing a patch and update process to convert OpenBSD to use 64bit types for time_t and ino_t. This involves rolling new versions of 22 system calls, a compat kernel that supports just enough of the old ABI to build the system, and then rebuilding the system on that.

Let me be perfectly clear: _when_ this change is made, *all* binaries will have to be replaced. The major versions on system libraries will be bumped. This is like the a.out to ELF conversion: upgrading by compilation WILL NOT BE SUPPORTED.

In the mean time, there's still lots of work to get done. Several of us (primarily Theo and Ted) have been committing preparatory fixes in this for months, fixing printf() format strings and casts throughout the system to make code work correctly whether the types are 32bit or 64bits. As of the start of the hackathon there were two things I wanted to deal with for this:
1) split ino_t type into per-filesystem and VFS versions, and
2) <dramatic chord> fix the NFS server code for 64bit ino_t

So I spent the first few days of the hackathon reading the NecroFSicon, particularly nfsrv_readdir(), deciding that the best way to resolve the ino_t and dirent size issues between it and the exported filesystems would be to include the directory offset in each dirent entry and, as a side-effect, convert the syscall interface from getdirentries() to the SysV-style getdents(). After slogging through all the details of this in both the kernel and userspace, I managed to demonstrate NFS service between a system patched to use 64bit time_t and ino_t types and a system running -current, exporting FFS, CD9660, and NTFS filesystems. Best of all, the changes involve deleting much more code than is added.
Simplification, yay!

With that in good shape, I generated a good diff and set it aside for when we have time to deal with the remaining fallout of the change. There are still some programs in base that don't handle huge time_t values well and many ports have issues, so we can't make this jump right now in the development cycle.


2) stuff dealt with at the hackathon

Helped tedu understand the single_thread_*() calls in the kernel so that he could track down kurt's evil thread creation/exit race problem.

Cleaned up some source inconsistencies in ld.so encountered during the time_t hacking described above.

Converted several kernel APIs to use struct timespec instead of struct timeval, to avoid lose of precision in various cases.

Tracked down and fixed a race seen by ports builders that would let a thread get a signal while it was still being created, which would crash the kernel on a NULL reference.

Dealt with a bottle of Older Viscosity.

Various minor POSIX compliance fixes.


3) stuff started at the hackathon

ajacoutot@ had encountered some ports that were wanting to use per-process or per-thread CPU-time clocks, so I looked at what implementing CLOCK_PROCESS_CPUTIME_ID, CLOCK_THREAD_CPUTIME_ID, clock_getcpuclockid(), and pthread_getcpuclockid() would take. It turns out that we already track everything that's needed, it's just a matter of teaching the clock_get{time,res}() syscalls to return the correct info. That was made slightly more complicated by part of tedu@'s spinlock work, but after trying to strangle him twice I was convinced that there was a better way to solve the problem that didn't involve killing Ted. That diff is pretty much baked now, and I expect I'll commit it soon.

Overall, I found this to be one of my most productive hackathons. krw@'s machinations were quite successful and his food recommendations were excellent. Toronto, same time next year?

Philip
That is quite a lot of ground covered. Thanks to guenther@ for the work and the writeup!
Psst! we're not done yet! There are more t2k13 reports in the pipeline.

(Comments are closed)


Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]