OpenBSD Journal

OpenSMTPD: more features, more cleanup, more more

Contributed by tbert on from the plus-de-clochette-a-vache dept.

Gilles Chehade(gilles@) has written about the updates to OpenSMTPD that he, Eric Faurot(eric@), and Charles Longeau(chl@) have recently committed to OpenBSD.

Expansion format

We recently added a new logic for aliases and ~/.forward expansion format which allowed providing more readable formats. We used to support one char formats like %a, %u, and it was both confusing and hard to extend to support new formats while still making sense at first sight. We now support a clearer format: %{rcpt.domain}, %{user.directory}, etc... The are various supported formats documented in smtpd.conf(5) and each support partial expansion using optional begin and end positive and negative offsets: with user "gilles", %{user.username[1:5]} will expand to "ille".

The code has been simplified and an off-by-one bug has been fixed that could prevent the string from fully being expanded. While at it, we added %{sender} as a shortcut for %{sender.user}@%{sender@domain} and did the same for %{rcpt} and %{dest}.

Reworked support for virtual domains

While simplifying some code I ran into the virtual domains code and I recalled that an OpenBSD hacker had requested a very simple feature: allow mapping one virtual domain to another so that they both share the same users.

I tried to think of a way that would not involve yet another lookup in a table at runtime and I then realized that by slightly changing the syntax, I could simplify the virtual domains support by sharing more of the primary domains code path, while providing this feature in a very simple way and enabling hostname wildcards which we lacked for virtual domains but had for primary domains... without writing code. Yes, by removing code we added features ;-)

Until now, to provide support for a virtual domain for example.org you had to create a smtpd.conf with the following rule:

table <domains> "/etc/mail/vdomains.txt"
accept for virtual <domains> deliver to maildir
and you had to create a /etc/mail/vdomains.txt that contained:
example.org enabled # required
gilles@example.org gilles
@example.org gilles # catch all
Note the special entry which allowed OpenSMTPD to lookup the domain without iterating on the keys. This worked fine and is what's actually in use @poolp.org for various domains; however by rewording a bit, we can have:
table <domains> "/etc/mail/domains"
table <users> "/etc/mail/users"
accept for domain <domains> virtual <users> deliver to maildir
Where /etc/mail/domains contains a list of domains on a line by themselves, and /etc/mail/users contains a mapping like before except that:
# example.org enabled -> no longer required, it can go away
gilles@example.org gilles
@example.org gilles # catch all
This doesn't look so much better, until you actually take a step back. First of all, the rule starts uses "for domain" instead of "for virtual" and the parameter to "for domain" is a either a static list or a table providing the K_DOMAIN service. This means that you can:
table domains { poolp.org, "*.poolp.org", opensmtpd.org }
accept for domain <domains> virtual [...]
or even:
table domains "/etc/mail/domains" # one per line
accept for domain <domains> virtual [...]
or ...
accept for domain { poolp.org, opensmtpd.org } virtual { root => gilles } [...]
And the same rule will take care of as many virtual domains you want, and since it uses the same service for looking up primary and virtual domains, the wildcard which worked for primary domains automagically work for virtual domains.

It gets better. Since we now provide both domains and users as separate tables, we can use the same user mapping for multiple domains, essentially mapping them one to another:

table domains { poolp.org, opensmtpd.org }
table users { staff => gilles,eric,chl }
accept for domain <domains> virtual <users>
With this configuration, mailing staff@ will deliver to gilles, eric and chl no matter the domain.

Do not forget that tables can be inlined, can be declared statically or using the <bracket> notation; they are equivalent so anything will do as long as the table provides coherent data.

Simplify parse.y grammar

I spent hours working on a simplification of the parse.y grammar which worked fine for correct configurations but which could allow very twisted configurations to parse as valid.

While fixing an ambiguous case, I realized that the parsing had inherited some complex logic from a feature that was desired a long time ago but which we never used as the rules could become ambiguous. So I began shooting down various parts which were not worth implementing and ended up removing no longer required lists and structures, making the use of tables more coherent across the daemon as sometimes they were referred to by table pointer and sometimes by table id.

The end result is a much less bloated grammar, which is semantically more right, and which is much cleaner.

Sure, it breaks existing configuration files but given that none should exceed about 10 lines, it should be straightforward to fix: the poolp.org smtpd.conf which voluntarily exhibits a complex setup (aliases, primary domains, backup MX, virtual domains, relaying, etc.) was converted in two minutes.

More virtual simplification

Building on the foundations from the cleaned up parse.y, I brought a new feature which was not possible before: using "for local" or "for any" as the destination of a virtual domain. Indeed, you could "accept for any" or "accept for local" but then OpenSMTPD would assume a primary domain and would perform a system user lookup for the user part. This was actually a side effect of a parse.y ambiguity where ANY, LOCAL, DOMAIN and VIRTUAL were at the same level and VIRTUAL being considered as a special case.

Now, a virtual domain is simply a regular domain which has defined a virtual mapping. The same logic has been applied to ANY and LOCAL meaning that you can "accept for any" or "accept for any virtual [...]" and it will work as expected, instead of the syntax I demonstrated last week: accept for domain "*" virtual [...], which is still valid but will work in a different way internally.

A user had opened a ticket to ask if we could turn OpenSMTPD into a sink where it would accept mail for any destination and deliver to a single account. With the "accept for any virtual [...]" improvement it became simpler as it only required adding a global catch-all that isn't domain aware.

Virtual now supports a global catch-all "@" which allows a static mapping to handle the catch-alls for multiple domains:

accept for any virtual { "@" => gilles } deliver to maildir
The above will have any user of any domain delivered to local account gilles.

MFA rework

The MFA process, in charge of filtering the different stages of a SMTP session has been considerably simplified. It no longer knows about SMTP states, which was an API layer violation, and has had a lot of code removed while providing the same service. It shrank by over 100 lines and has become a very small piece of code whose only purpose is to serve as the entry point to the filters evaluation.

This was a pre-requisite to the filter work which is already complex enough that we didn't want to add unneeded complexity upfront.

MTA rework

A lot of work has been done in the MTA engine which was almost completely rewritten. MTA now knows how to share MX, PTR and credentials for a route. This means that a single DNS request to the lookup daemon can be performed to deal with multiple messages heading to multiple domains.

It also deals much better with MX problems: When too many sessions fail to establish a link with a specific MX, the MX is marked as broken and not tried any further by new sessions.

It is also capable of dispatching connections on various MX of same priority: When sending many messages, the MTA will spawn sessions against the different MXs with the lowest priority within the default connection limit.

If all MXs at a given priority have errors, it moves to the next level to reach backup MXs. If no MX can be reached for a route, a temporary error is triggered.

I should mention that several kinds of error that used to trigger a permanent error on messages will now simply mark MX on which they occur as broken, giving the mails a chance to be routed through another MX.

The logs have been improved too, especially with SSL-related errors. All problems on MXs are now logged (as "smtp-out:") to help the administrator diagnose with relaying.

SMTP engine rework

The SMTP engine has been reworked as a pre-requisite for filters. It is now running on poolp.org and powers the OpenSMTPD mailing list.

The idea was to make the code generally simpler to follow and extend.

First, most of the smtp specific structures and defines have been isolated into smtp_session.c, which makes the smtpd.h file a bit less bloated. It was also the opportunity to finally get rid of the horrible submit_status structure.

The dispatching of user commands is more straightforward, and the imsg dispatching code now makes use of very specific message structures.

The rewrite was painful, mostly because the previous code was not easy to grasp, and because we slacked too much on the regression suite, which got improved in the process.

Furthermore, the interaction with the MFA got more complicated with its recent update. Basically we needed to decouple the forwarding of the message data to the MFA from the receveing of filtered data.

To sum up, the new code should be a much better ground to implement features like proper filtering and pipelining on the SMTP side.

imsgproc and filter work

OpenSMTPD has had filters for over a year now, but disabled as the API is not stable and there were more important stuff to deal with.

Filters in OpenSMTPD are standalone programs that are linked against a lib we provide to demonize and provide an event-based callback mechanism.

The filters can be very very easy and an example of a working filter could be as simple as:

#define	SMTPD_FILTER
#include "smtpd-api.h"

void
mail_cb(uint64_t id, struct filter_mail *p, void *arg)
{
	/* block idiots */
	if (! strcmp(p->domain, "0pointer.net")) {
	    filter_api_reject(id, 530, "You're not welcome, go away !");
	    return;
	}

	filter_api_accept(id);
}

int
main(int argc, char *argv[])
{
	/* init the lib and setup daemon and imsg framework */
	filter_api_init();

	/* register callbacks */
	filter_api_register_mail_callback(mail_cb, NULL);

	/* event loop */
	filter_api_loop();

	/* never reached */
	return 0;
}

I have spent time in the glue code to change a bit how it worked and have it rely on a new api, imsgproc, which generalizes the setup imsg / fork / exec operation which we'll end up using for all filters but also possibly for other pluggable backends. It works fine and I've successfully compiled about a dozen different filters working with different hooks and combinations of hooks.

The API is still not currently enabled, but will be available in future releases.

Monkey Branch

I tackled a new issue: how do we ensure that our error code path is correct for the errors that you cannot reproduce easily because they are so rare? A typical example of such an error is a getpwnam() failure because a descriptor was not available or because we received an EIO. In such cases, we want OpenSMTPD to correctly handle the error as transient and not reject it permanently which would lead to a lost mail.

I recalled the Chaos Monkey tool at Netflix that voluntarily produced errors at random to let them ensure their high-availability really works, and I came up with a set of MONKEY_* macros to randomly produce chaos at strategic places. The monkeys will provoke latency in imsg handling and will cause some places to report a temporary failure at random.

In the ten minutes that followed my initial testing, they helped spot two very subtle bugs which we would have probably not run into in years. We now know for sure that these code paths are correct and we need to spread more monkeys ;-)

This is done in a separate branch which I mirrored on github and which is synched with master. To test, simply checkout the monkey branch (lol no ? [no -ed.]) and setup env CFLAGS=-DUSE_MONKEY before building. Then, when sending mail to yourself you should start seeing:

$ echo test | mail -s 'test' gilles 
$ echo test | mail -s 'test' gilles 
send-mail: command failed: 421 Temporary failure
$ echo test | mail -s 'test' gilles 
send-mail: command failed: 421 Temporary failure
$ echo test | mail -s 'test' gilles 
$
Ideally, OpenSMTPD should NEVER return a 5xx error while running in monkey mode that it doesn't return when not running in monkey mode.

Improved DNS API

Eric has cleaned and improved the DNS API: dns_query_*() functions now have a more logical ordering of their arguments; struct dns has been killed and we now use two small imsg-specific structures; dns_query_mx_preference() has been introduced to retrieve the preference level of a specific MX for a domain; and finally all MX addresses are looked up in parallel instead of sequentially.

MTA improvements

Eric has also improved the MTA internal operations so that it uses better abstractions for relay domains, hosts, sources and routes. A MTA session now operates on a given route and reports errors on that route only.

The relay tries to open as many routes as possible, within various limits, and drains all mails dispatching them. Oh and it is ready to use the K_SOURCEADDR lookup service but doesn't do it yet, this will probably be part of next week's milestone.

Table API improvements and new SQLite backend

We replaced the map API with a new table API that provides much simpler and saner semantics. Tables are simpler to declare than maps from a smtpd.conf point of view; they have types so that OpenSMTPD can detect tables used in an inappropriate context at smtpd.conf parse time; and they provide lookup services which allows backends to support only a few kinds of lookups and smtpd to spot that at smtpd.conf parse time too. The new table API simplifies a lot of things, the backends are simpler to write and the end-result is much more reliable.

The table API has been improved to now keep backend handles opened and avoid performing an open and close of the backend at each and every lookup. This does not have much of an impact for the db and static backends, but it was required before we start writing network backends (ldap for a start).

With new table API I could get rid of the user_backend API and replace it with a table lookup service. It is now possible to write table backends to lookup system users instead of relying on getpwnam... but at the moment OpenSMTPD hard-codes the use of an internal <getpwnam> table so there's still work to do.

We no longer support "file" as a lookup backend. Whenever a smtpd.conf refers to a file for a lookup, it will internally convert it to a static table which will achieve the exact same result while removing duplicate code. The change is not visible to the users.

The table API now provides a simple mechanism for backends to support a configuration file without having to deal with the parsing. The backend can be declared in smtpd.conf using:

table foobar mybackend:/etc/mail/mybackend.conf
Then the backend may simply do:
static int
table_mybackend_config(struct table *t, const char *configfile)
{
    void    *cf;

    cf = table_config_create();
    if (! table_config_parse(cf, configfile, T_HASH)) {
        table_config_destroy(cf);
        return 0;
    }

    table_set_configuration(t, cfg);
    return 1;
}
To have the /etc/mail/mybackend.conf file parsed into a key/value table. It can then fetch the values from any other handler using:
static void *
table_mybackend_open(struct table *t)
{
    void    *cf = table_get_configuration(t);

    if (table_config_get(cf, "key") == NULL) {
        log_warnx("table_mybackend: open: missing key from config file");
        return NULL;
    }

    return t;
}

Since I needed a use case, I added support for SQLite as a table backend allowing the use of SQLite for *any* kind of lookup. I had already done it in the past as a proof of concept when we were still using the map API for lookups, but this time it's the real deal.

SQLite support is achieved using the same approach as that of Postfix where the schema is not imposed but rather the user provides the queries themselves to allow as much flexibility as possible.

To show you how it can be set up, here's a sample smtpd.conf:

# smtpd.conf
table mytbl sqlite:/etc/mail/sqlite.conf

# i could have another one configured differently
table mytbl2 sqlite:/etc/mail/sqlite-other.conf

# and i can have the same one serve different kinds of lookups ;-)
accept for domain <mytbl> alias <mytbl> deliver to mbox
and here's the sample sqlite.conf that goes with it
# Path to database
#
dbpath			/tmp/sqlite.db

# Alias lookup query
#
# rows   >= 0
# fields == 1 (user varchar)
#
query_alias		select value from aliases where key=?;

# Domain lookup query
#
# rows   == 1
# fields == 1 (domain varchar)
#
query_domain		select value from domains where key=?;
Of course, you may have multiple smtpd tables using sqlite backends, they may use different configuration files, you can have two aliases databases for two different domains or use the same database table to hold all information for all lookups. It's as flexible as it gets ... ALL lookup services are supported by the SQLite backend so it can be used to store anything used by OpenSMTPD.

K_USERINFO lookup service

OpenSMTPD uses the table API for every lookup but there was still one kind of lookup that was performed using a different API: user information.

The table API expects lookups to be done asynchronously but OpenSMTPD had some code that looked up users synchronously, like right before a delivery or to find the home directory of a user for a ~/.forward check.

I introduced a new lookup service, K_USERINFO, which allows processes to lookup for information regarding a username, such as uid, gid and home directory. I then reworked the ~/.forward check and the delivery code to ensure the user information lookup is performed asynchronously through the K_USERINFO service rather than through the user_lookup() API which was synchronous.

The only backend to implement K_USERINFO was table_getpwnam which was essentially doing the same as before, just asynchronously, and at that point, user_lookup() bit the dust.

K_SOURCEADDR table service

OpenSMTPD has been taught how to fetch a source address from a table, but does not make use of it yet. This will, for example, allow users to force a source address for their outgoing mail when they get blacklisted by the monkeys at spamhaus.

The K_SOURCEADDR service performs a cyclic lookup returning each address of a table one after the other so that a table holding multiple addresses will cycle through them.

REALLY VIRTUAL users

A feature that has been requested for a long time and which was very hard to implement was support for virtual users.

OpenSMTPD required that the end user be an actual system user that could be looked up using getpwnam(). The K_USERINFO lookup service changed this slightly by having OpenSMTPD require that the end user be a user that could be looked up using table_lookup().

And since we can write lookup services using any backend, I wrote K_USERINFO handlers for table_static, table_db and table_sqlite. I then added a new keyword to smtpd.conf to allow rules to specify a user table:

table bleh1 { vuser => vuser:10:100:/tmp/vuser }
table bleh2 { vuser => vuser:20:200:/tmp2/vuser }

accept for domain poolp.org users <bleh1> deliver to maildir
accept for domain opensmtpd.org users <bleh2> deliver to maildir
accept for domain pool.ps deliver to maildir
With this, OpenSMTPD will accept mail for domain poolp.org but will only find users if they are part of the table bleh1. The domain opensmtpd.org has a different users database, that shares a username but does not share the uid, gid and homedir. In this example, I used static tables, but it could really be sqlite, db or whatever ;-)

The domain pool.ps has no users table and defaults to the system database which is what most users will expect.

Relay URL update and K_CREDENTIALS lookup service

For several months, smtpd.conf has supported a "relay URL" format to define relays we want to route via:

table creds { mail.poolp.org => gilles:mypasswd }
accept for any relay via tls+auth://mail.poolp.org:31337 auth <creds>
When sending mail, the creds table will search for an entry matching the domain name of the relay and find the credentials there. This has annoyed me for a while because it meant that it was not possible to share credentials between multiple relays, it was not possible to have different credentials for two relays operating under the same name, etc, etc ...

Also, it annoyed me that outgoing authentication would use K_CREDENTIALS while incoming authentication would not use the table API but the auth_backend API instead.

I convinced Eric that it would be nice to provide a new mechanism in relay URL so that we could have a label, like tls+auth://label@mail.poolp.org:31337 and it would be used as the key for the credentials lookup.

This would allow multiple relays to refer to the same label, or different relays under the same hostname to refer to different labels. It would also allow two nice tricks: first, since the labels are looked up in a different service then we can update the creds table live and MTA will pick up the change; then, if for incoming authentication we assume the username to be the label, then K_CREDENTIALS can be use as THE mechanism to authenticate both in and out sessions and we can kill the auth_backend API.

So ... I wrote it and we can now do:

table in_auth { gilles => gilles:encryptedpasswd }
table out_auth { bleh => gilles:cleartextpasswd }

listen on all tls auth <in_auth>

accept for domain poolp.org deliver to maildir
accept for any relay via tls+auth://bleh@mail.poolp.org auth <out_auth>
And this closes the last issue with regard to assuming any locality of the users for any purpose. An OpenSMTPD instance no longer assumes users to be local, or to really exists, for any purpose whatsoever.

LDAP backend

I had started working on LDAP support for OpenSMTPD a long time ago but for some reason the support was never finished and ended up rotting in my sandbox.

A few months ago, I brought the bits back to a git branch so that I would keep running into it every few days as a reminder that I should not slack. But since I'm not too much of a LDAP fan, or a LDAP user for what it's worth, I made the branch public in hope someone would pick it up and move it forward.

A poolp user had started bringing the bits up to date and getting a working support in shape for aliases lookup. Resuming from there I simplified the code further and added support for almost all kinds of lookups making OpenSMTPD capable of using LDAP as a backend for the most common use-cases.

Here's a configuration file to authenticate local users, lookup a domain and perform aliases lookups against LDAP:

# /etc/mail/smtpd.conf
#

table myldaptable ldap:/etc/mail/ldapd.conf

listen on egress tls auth <myldaptable>

accept for domain <myldaptable> alias <myldaptable> deliver to maildir
accept for any relay
and here's the table configuration:
# /etc/mail/ldapd.conf
#

url             ldap://127.0.0.1
username        cn=admin,dc=opensmtpd,dc=org
password        thisbemypasswd
basedn          dc=opensmtpd,dc=org

# aliases lookup
#
alias_filter            (&(objectClass=courierMailAlias)(uid=%s))
alias_attributes        maildrop


# credentials lookup
#
credentials_filter      (&(objectClass=posixAccount)(uid=%s))
credentials_attributes  uid,userPassword

# domains lookup
#
domain_filter           (&(objectClass=rFC822localPart)(dc=%s))
domain_attributes       dc
The support is functional but it needs to be improved further as it currently has two drawbacks: the backend does not reconnect to the LDAP server should it lose the connection, and it performs synchronous queries, which means that queries that take time to complete will be bog down the lookup process.

Also, I only tested with OpenBSD's ldapd(8) as it was dead simple and I didn't want to endure more pain than necessary. Turns out, it did make my experiment far more enjoyable that I would have assumed ;-)

I'm planning on becoming more familiar with LDAP, as I suspect I'll be getting questions regarding LDAP every now and then given how many times it's been requested in the past. I might as well know what I'm talking about :-)

Source address selection

Eric has plugged the K_SOURCE lookup service to relay rules, which allows OpenSMTPD to perform a lookup of the source address it should use when the transfer process establishes an outgoing connection to a relay.

Until now OpenSMTPD could not force the IP address it used for outgoing trafic without relying on a hardcoded hack that was committed to the poolp branch. It was done this way on purpose and we delayed this feature until the other parts were rewritten appropriately for the puzzle to fit right.

It is now possible to force an address using the source keyword:

table myaddrs { 88.190.237.114, 91.121.164.52 }

accept for any relay source <myaddrs>
If multiple addresses are provided, they will be cycled through, and the mta code will detect which ones are no longer usable.

Intermediate bounces

A feature we had a long time ago and which disappeared during a cleanup was the support of intermediate bounces.

When OpenSMTPD fails to deliver a mail it has to notify the sender that the message was never delivered. It sometimes happens immediately, but sometimes the failure may be temporary and the daemon keeps the message and tries to deliver it every now and then (ok, the logic is slightly more complex, but you get the idea). In such cases, the bounce will not be sent before OpenSMTPD gives up on trying after 4 days by default.

Obviously, getting a mail 4 days later to tell you that no one read yours when you assumed it was already in the recipients mailbox for a while is quite irritating. The intermediate bounce will instead notify the sender that an error occured after a few temporarily failed deliveries, and let him know that the daemon will keep trying to deliver for a while.

After discussions, Eric reimplemented intermediate bounces in OpenSMTPD but did it in a slightly different way than with other daemons. By default, an intermediate bounce will be sent after a mail has been sitting in the queue for over 4 hours without being delivered ... but in addition, a set of delays may be provided in smtpd.conf to send multiple intermediate bounces. For example, if I wanted intermediate bounces to be sent after 4 hours, after a day and after two days, I could simple use:

bounce-warn 4h, 1d, 2d
The keyword may change, but the idea and code is here and working.

Tags & DKIM example

I've implemented tagging of sessions a very long time ago, I *think* it was actually already there when OpenSMTPD was not yet OpenSMTPD but still a poolp project :-)

The feature was hidden and undocumented; it had uses but so limited that I did not want users to start using it in random situations that I would have to cope with later. Basically, a listener may tag all sessions initiated through it; then rules may apply to specific tags allowing some rules to apply only to some sessions.

Eric realized that this was perfect to deal with one of our use-cases: DKIM signatures.

We want DKIM signatures but we don't necessarily want to write a filter for that as there are already tools that do the work. So we need to accept a message, pass it to a DKIM signing tool, which will in turn pass it back to us, so that we can send it where it needs to be sent.

A tool to do this is DKIMproxy. I won't go into the details of DKIMproxy, but the idea is that using tags we can determine which sessions we want forwarded to DKIMproxy, and which sessions are coming back from it and need to be relayed to the final destination:

# listen on all interfaces that are attached to the default route
#
listen on egress

# listen on loopback interface on port 10029 and tag DKIM
#
listen on lo0 port 10029 tag DKIM


# only accept to relay the sessions that are tagged DKIM
#
accept tagged DKIM for any relay


# this is reached by the sessions that are NOT tagged
# and will cause OpenSMTPD to relay to the DKIMproxy
#
accept for any relay via smtp://127.0.0.1:10028
Now if I were to send mail to my gmail account, I would connect to the daemon, my session would not be tagged so it would match the last rule causing the message to be send to DKIMproxy. DKIMproxy would then relay back the mail to my loopback interface on port 10029 which would have the new session tagged DKIM causing the first rule to be matched. Four lines. Ridiculous.

SSL verification and separation

OpenSMTPD has had code to deal with establishing secure channels for a very long time now, however what it didn't do was to perform certificate chain verification in both server and client modes. In server mode, it would never request a client certificate and in client mode it would never check that the certificate handed by the server was valid.

So I started adding support and hit the first issue: access to the CA bundle from within the chroot. After a discussion with reyk@ over what was the best way to deal with it, he convinced me that we could really improve our design by moving certs and keys from network-facing processes to a separate process and performing on-demand requests.

Dealing with OpenSSL was as nice as usual, it almost seemed like sharing a tasty dinner with Richard Stallman, but I eventually saw the light and got it to work as expected. The good part is that the client and server modes have symmetric operations which means that the code is identical for the most part.

We now have the sensitive stuff isolated in the lookup process. The smtp and mta processes use the imsg framework to request them and to pass over chains for verifications. It all fits in a very few lines of code which is cool because the less OpenSSL code I have to deal with, the better.

For now, we don't do a full verification of the X509 attributes so OpenSMTPD lies about the verification and pretends it didn't do it in the Received lines, however the admin can see the verification take place in the logs. I'll fix the Received line to tell the truth when I'm confident the verification is 100% accurate.

Now, before I switch subject, a couple related ideas for the future:

If we were to provide a K_CERT table service, we could fetch the certificates and chains from custom backends (ldap, sql, you name it, ...), this would take approximately one hour of work. The only reason I'm not doing it is that I don't have a need for that at the moment ;-)

Also, the certificate verification reply to mta and smtp is a simple status that is either success or failure. This means that implementing a certificate-based authentication is now trivial, probably an hour of work too. Guess why I didn't implement it yet?

Fix relaying logic

While working on the SSL code, I spotted strange behaviour when relaying between my primary and secondary MX.

After some testing, I realized that we lost the "backup" flag somewhere. After a quick chat with Eric, I convinced him we should introduce the backup:// schema and get rid of the flag in the envelope. This way, we could make it obvious that mx2.poolp.org is a backup MX by having

backup://mx2.poolp.org
as a relay URL.

Then, I spotted some strange issues where I didn't request TLS and it would attempt it, or where I requested TLS but it would fallback to plaintext. It became obvious there was something fishy with the semantic of our relay URL schemas and that they needed to be more clearly defined.

After going through every possible schema, we defined them as follow:

smtp+tls:// -> attempt TLS, then fallback to plaintext, this is the default
smtp://     -> plain text ONLY, no encryption
tls://      -> STARTTLS only, encrypted channel guaranteed
smtps://    -> SMTPS only, encrypted channel guaranteed
ssl://      -> STARTTLS, fallback to SMTPS, encrypted channel guaranteed
With this, unless smtp+tls:// is specified, we never fallback to plaintext if an encrypted channel is requested and we never break the user assumption that a relaying will be secure by sending the data over a plaintext channel.

Stressing the daemon

We've done a lot of stress testing lately.

We have tested the incoming path, accepting hundreds of thousands of mails from hundreds of sessions, arriving in chaotic order and with random data. We're quite confident that our incoming path is rock solid now, we will be performing our final test very soon with one billion messages.

We have also tested our outgoing path, first in a confined environment which revealed no bugs, and then in a live environment which revealed minor bugs that where fixed in the process. There is still one "memory usage"-related issue but it applies to very special and stressed setups, not something people would usually experience, and we happen to have a diff for it which requires some additional work before being in a state proper for commit.

During these stress tests, we have gathered a lot of information to prove some of our theories right and wrong. We now know where we stand with regard to other MTA and we have built tools which allow us to have a very precise understanding of our areas of improvements. Clearly, we have no reason to be ashamed, far from it.

Optimization

Our queue code has a design that allows for very efficient queues to be written. We know that because we have already written several backends, we know the time spent in the API, the time spent in the backends and for disk-based backend the exact cost of the disk IO.

You'd assume our queue would be very fast, but the default backend we ship is sub-optimal by design because the admin in me wants the queue to provide some features that are incompatible with performance.

One of these features is to provide per-envelope files as well as a locality between envelopes and messages so that SMTP transactions can be backed up and restored easily on another machine. Stuff like that.

This user-friendliness means that we can't rely on tricks to avoid hitting the disk too often, we have as many open()-fsync()-close() as there are envelopes; we have as many mkdir()/rename() as there are messages; and you can add many more file-system related calls used to deal with the atomicity of our queue commits. A lot of this is unnecessary for OpenSMTPD and could be handled in a much more efficient way... but is really only done so that a human can inspect and manipulate the queue more easily.

A queue that allowed for envelopes to be written sequentially in a single binary file that would require only one fsync() could drastically increase our performances, and we will surely do that at some point, but our default queue will always be the user-friendly one even at the cost of a slower incoming path.

That being said, we still want our queue to operate fast and limit the impact of our design. Ideally, our queue shouldn't be more than 10% slower than the other software with their optimized queues. So...

I spent some time tracing our queue code and spotted that the system calls pattern was not really matching what I'd expect. Our queue logic is very simple and has a very "linear" pattern, several system calls should have a matching number of calls. I tracked and fixed until kdump(1) output displayed the optimal number of calls for each system calls according to the numbers I had on paper. I added a couple functions to help profile every single queue operation, Eric came up with a better interface for these functions and we now have an invaluable tool for queue backend development :-)

Eric also spent time improving our memory usage and removing some pressure from inter-process IO by coming up with an API that allows us to encode/decode the data more efficiently. Until now we passed structures, which could contain huge buffers in which we only consumed a few bytes; now the new API not only passes only the required data but also provides type checking which allows us to make sure we don't pass the wrong data by mistake. As a bonus, the API can use the types to know the average size of the data and only reallocate in cases where we exceed that size.

When we were done with this, we were very good with the outgoing path, and we were pretty much equivalent to the other MTA with regard to the incoming path. There is still a lot of room for improvement, but given the constraints we imposed ourselves I'm really glad that we're not twice as slow as the slowest MTA out there :-)

A more fault-tolerant filesystem queue

Our default queue had a design that was very strict with "strange" situations.

Any time an unexpected situation happened, the daemon would fatal. Since unexpected situations are not supposed to happen, this shouldn't be a problem right ?

No. Not right. On a regular setup this never happens, but sometimes a human does something as innocent as a chmod on a file or directory... and OpenSMTPD figures something is not normal and commits suicide.

To be honest not only did I not receive a report of a queue fatal in months, but I also don't recall ever hitting any of these on poolp.org... until the stress.

I hit a couple fatal() which turned out to be related to an error in our use of the fts(3) API which for some reason didn't trigger until the stress. I fixed the issue then decided to go track every single fatal() in the queue code and try to convert it into a temporary failure condition so that even if a failure happened, the daemon would deal with it gracefully.

It turned out to be much simpler than I assumed and our fs-queue is now capable of coping with an admin messing with the queue. Of course, an admin should never tweak with the queue, but being able to not fatal() on a chmod(1) or mv(1) felt essential ;-)

Per-listener hostname

Our smtpd.conf file had a "hostname" directive which allowed for setting the hostname to be displayed on the greeting banner.

The directive was removed and it is now possible to specify the hostname for each listener:

listen on lo0
listen on 192.168.1.1 hostname mx1.int.poolp.org
listen on 192.168.2.1 hostname mx2.int.poolp.org
Not specifying one will use the machine's hostname.

Per-source HELO

When relaying mail, smtpd.conf allows for overriding the source address using a table containing an address or a list of addresses:

table myaddrs { 192.168.1.2, 192.168.1.3 }

accept for all relay source <myaddrs>
The above causes the relaying to bind one of these addresses a the source address. However, during a SMTP session, our MTA has to advertise itself at the HELO/EHLO stage and tell its hostname. The hostname is sometimes checked and if it doesn't match the source address used, the MTA is rejected.

So we needed a way to have our MTA provide the remote host with a HELO/EHLO parameter that corresponds with the source address used. We had ideas that involved performing a DNS lookup from MTA but it would not work due to NAT.

I suggested that we use a new lookup service K_ADDRNAME which allows for a mapping of an IP address to a name:

table myaddrs { 192.168.1.2, 192.168.1.3 }
table myhelo  { 192.168.1.2 => mx1.poolp.org, 192.168.1.3 => mx2.poolp.org }

accept for all relay source <myaddrs> helo <myhelo>
With this, the MTA will use a source from the table myaddrs and, at HELO/EHLO time, will use the name from the table myhelo that matched the address it used to connect.

Sender filtering

Another feature that people have been requesting very frequently is the ability to use a sender email address as a matching condition in the ruleset.

Until recently, the matching of a rule was done by looking at the client address and the destination domain. It was not possible to express something like "accept all mail coming with sender gilles@poolp.org" or "reject all mail coming with sender @redhat.com".

I have introduced the "sender" filtering which can apply to a full email address or to a domain, both in accept and reject rules. It works as follow:

	
# accept from any source, if sender has domain @openbsd.org [...]
accept from any sender "@openbsd.org" for any relay

# accept from localhost, if sender has domain @poolp.org [...]
accept sender "@poolp.org" for any relay

# accept from any source, only if sender is gilles@poolp.org [...]
accept sender gilles@poolp.org for any relay
It can apply to relay or deliver rules, and allows the use of tables to apply different relay rules to different domains or users coming from different networks.
table hackers { "@opensmtpd.org", "@poolp.org" }
table slackers { richard@foot-cheese.org, lennart@thepig.org }

accept from 192.168.1.0/24 sender <hackers> relay
accept from 192.168.2.0/24 sender <slackers> relay via smtp://example.org

SSL code cleanup

An OpenBSD user reported a problem with ldapd's ssl.c where the prime used for DH parameters was 512 bits which is short enough by today's standards to be rejected by OpenLDAP's client. OpenSMTPD's ssl.c has been changed a long time ago to bump this prime to a 1024 bits prime, but ldapd's ssl.c was actually a copy of OpenSMTPD's ssl.c from two years ago.

The desynchronization of ssl.c accross OpenBSD daemons has annoyed me for a long time but it occured to me that maintaining this gap would be causing further divergences in the future as reyk@ moves OpenIKED and relayd forward. After a quick chat with him I started creating a daemon-agnostic version of ssl.c.

There is still work to do in that area, but OpenSMTPD now comes with a ssl_privsep.c that is equivalent to that of relayd; a ssl.c file that no longer knows of any smtpd specific structures and which can be shared with other daemons; a ssl_smtpd.c that contains the smtpd-specific bits.

Note that ssl.c doesn't contain new code, this was only a rework of the interfaces to allow it to be shareable with different daemons.

Runtime tracing and profiling

Support for a monitor command has been added to the smtpctl utility. It allows an administrator to easily monitor a running instance of OpenSMTPD and what it is doing in real-time, displaying states every second.

I have added a feature that I've been wanting for a long time: activating traces at runtime without a daemon restart.

Say a user suddenly observes that connections to a remote host fails, he can now simply type:

smtpctl trace transfer
To obtain a real-time view of the sessions as they take place with remote hosts.

There are many trace subsystems, for incoming connections, outgoing connections, msg exchanges accross processes, etc, etc, ... read the man page ;-)

While at it, if you need to verify bottlenecks OpenSMTPD also supports real-time profiling of imsg and queue using:

smtpctl profile imsg
and
smtpctl profile queue
Both tracing and profiling can be turned on and off at runtime.

Improved memory use

Eric cleaned up some code to avoid passing envelopes when we could pass evpid instead. This prevents OpenSMTPD from accumulating data in the inter-process buffers (an evpid is 64bits, an envelope structure is several kbytes).

For the places where the envelope really needs to be passed, he suggested that we send a compressed version. I proposed that we use the ASCII envelope conversion as we know it works given that it's already used for disk-based envelopes. The ascii envelopes allow compressing the envelope quite a lot as instead of passing a large datastructure with partially used large fields, we pass the ascii representation that can be as small as a hundred bytes.

We should now be able to cope better in very heavily loaded situations ;-)

Various little fixes

You have no idea.

We have fixed various little bugs that triggered in very specific cases which you just can't hit out of a live test.

We also fixed/improves/rework minor things, like making the "relay backup" parameter optional by picking the machine hostname, changing the API for queue remove to take an evpid instead of a full envelope, removing userinfo from a structure that wasn't using it, switching the queue code to use the first two bytes of a msgid instead of the last two bytes to create the bucket, fixing a segfault with a specific configuration file, allowing authentication to fail temporarily, etc, etc, etc ...

We were told of a crash which appeared to be coming from a descriptor leak. On Linux, lsof would display that message files were removed while the transfer process still had a descriptor opened. We tracked the issue and spotted that it would happen because of a missing condition in the transfer process where the message file would not be closed if recipients were rejected and the session was reused to send another message. Two liner fix.

The logging format has been improved, we have done a lot of rewording and changes to provide the most information using a concise format that can be easily understood by humans and that can easily be parsed or grepped.

Added a RAM queue_backend, mostly useful for debugging at this point or if you don't care about losing mail when you shutdown the daemon ;-)

New dict_* API, akin to the tree_* API but with char* keys will allow us to simplify a lot of code with regard to how tables are handled and managed by OpenSMTPD.

TONS of KNF cleanup (several hundreds), removal of old defines, refactors to remove structures that are used as unnecessary indirections, simplifications of equivalent code, etc, etc ...

Various bug fixes including one causing some envelopes to possibly be skipped and triggering a start-time crash.

Improved scheduler API by removing the very annoying and tricky to implement Qwalk API and replacing it with a new queue operation Q_LEARN.

Mailq now supports an "online" mode which provides more information than the offline mode. The online mode will query the scheduler getting reliable real-time information. Offline mode is equivalent to what we had before.

We have spent a lot of time doing cleanup. I had already spent three hours fixing tons of KNF last week, but this time our cleanup was targeted at code and structures. We removed some imsg exchanges that were not required, we split a huge structure that was used for all kinds of exchanges into several smaller and more specialized structures. We reworked queue and mfa to make them simpler and as a result the code is easier to read, the imsg exchanges are easier to follow, it only gets better ;-)

We have spent a large amount of time working on a general cleanup of the code base. Amongst other things, we removed some fields from the global struct smtpd to statically isolate them to the specific files that were using them. This helped ensure that we didn't violate API layers.

Then we spent a great deal of time killing a monster structure, submit_status, which was used for all kind of inter-process exchanges. We came up with several lighter structures, carrying only the required information and being tied to a particular process to process exchange. This has required a bit of rework in various processes but the end result is less confusion, code that's easier to maintain and read for new comers.

First shot at a regress suite with a utility that allows the scripting of SMTP scenarios. In the future, we will be writing various scenarios which will allow us to verify that we don't introduce regressions with new features and bugfixes.

As always, testing is essential to keeping quality high. You can download the latest stable code snapshot from the OpenSMTP website. The OpenBSD tarball should build on -current (or -stable with a -current libc/asr directory). The portable version should compile without issue on Linux/FreeBSD/NetBSD/DragonflyBSD, provided that the library and header dependencies for sqlite3, Berkeley DB, and libevent have been met.

(Comments are closed)


Comments
  1. By sneaker (sneaker) sneaker@noahpugsley.net on

    Wow, a tremendous amount of work. Nice to see how this has progressed over the years.

    Thanks!

  2. By Peter J. Philipp (pjp) pjp@solarscale.de on http://centroid.eu

    This looks good! Thanks very much for the hard work! I've been using a backup mx with opensmtpd for a few months now, works flawlessly afaict. And looking through this new documentation and comparing it with my postfix setup, I may already be able to fully switch over to opensmtpd with full functionality (except for the RBL's). At work we use cyrus imap, for that setup I don't think we can switch over yet. But at home it sure looks compatible. Thanks again!

  3. By Will Backman (bitgeist) bitgeist@yahoo.com on http://bsdtalk.blogspot.com

    Thank you for all the work, and for putting in the time to write such an extensive article!

  4. By howabout (109.190.107.195) on

    How about Kerberos auth? Is someone working on this?

  5. By Uriel Fanelli (195.233.250.6) uriel@uriel-fanelli.no-ip.org on http://www.keinpfusch.net

    Well, I can only say "thanks" for all the work you put there.

    and... special thanks for focusing in grammar instead of adding a new configuration parameter everytime. I prefer to use logics than memory.

    Actually I was able to replace a 33 lines of postfix config in 4 lines of smtpd. Great. Simply great.

    Many thanks.

Credits

Copyright © - Daniel Hartmeier. All rights reserved. Articles and comments are copyright their respective authors, submission implies license to publish on this web site. Contents of the archive prior to as well as images and HTML templates were copied from the fabulous original deadly.org with Jose's and Jim's kind permission. This journal runs as CGI with httpd(8) on OpenBSD, the source code is BSD licensed. undeadly \Un*dead"ly\, a. Not subject to death; immortal. [Obs.]