Contributed by pitrh on from the guard my RET, you dept.
This year I went to BSDCAN in Ottawa. I spent much of it in the 'hallway track', and had an extended conversation with various people regarding our existing security mitigations and hopes for new ones in the future. I spoke a lot with Todd Mortimer(mortimer@). Apparently I told him that I felt return-address protection was impossible, so a few weeks later he sent a clang diff to address that issue...
The first diff is for amd64 and i386 only -- in theory RISC architectures can follow this approach soon.
The mechanism is like a userland 'stackghost' in the function prologue and epilogue. The preamble XOR's the return address at top of stack with the stack pointer value itself. This perturbs by introducing bits from ASLR. The function epilogue undoes the transform immediately before the RET instruction. ROP attack methods are impacted because existing gadgets are transformed to consist of "<gadget artifacts> <mangle ret address> RET". That pivots the return sequence off the ROP chain in a highly unpredictable and inconvenient fashion.
The compiler diff handles this for all the C code, but the assembly functions have to be done by hand. I did this work first for amd64, and more recently for i386. I've fixed most of the functions and only a handful of complex ones remain.
For those who know about polymorphism and pop/jmp or JOP, we believe once standard-RET is solved those concerns become easier to address separately in the future. In any case a substantial reduction of gadgets is powerful.
For those worried about introducing worse polymorphism with these "xor; ret" epilogues themselves, the nested gadgets for 64bit and 32bit variations are +1 "xor %esp,(%rsp); ret", +2 "and $0x24,%al; ret" and +3 "and $0xc3,%al; int3". Not bad.
Over the last two weeks, we have received help and advice to ensure debuggers (gdb, egdb, ddb, lldb) can still handle these transformed callframes. Also in the kernel, we discovered we must use a smaller XOR, because otherwise userland addresses are generated, and cannot rely on SMEP as it is really new feature of the architecture. There were also issues with pthreads and dlsym, which leads to a series of uplifts around __builtin_return_address and DWARF CFI.
Application of this diff doesn't require anything special, a system can simply be built twice. Or shortcut by building & installing gnu/usr.bin/clang first, then a full build.
We are at the point where userland and base are fully working without regressions, and the remaining impacts are in a few larger ports which directly access the return address (for a variety of reasons).
So work needs to continue with handling the RET-addr swizzle in those ports, and then we can move forward.
[followed by the diff]
You can find the full message with the diff here, or if you're already on on tech@, in a mailbox near you.
(Comments are closed)