Doing whatever Gmail says

2023-04-12 3-minute read

As we slowly move our members to our new email infrastructure, an unexpected twist turned up: One member reported getting the Gmail warning:

Be careful with this message The sender hasn’t authenticated this message so Gmail can’t verify that it actually came from them.

They have their email delivered to May First, but have configured Gmail to pull in that email using the “Check mail from other accounts” feature. It worked fine on our old infrastructure, but started giving this message when we transitioned.

A further twist: he only receives this message from email sent by other people in his organization - in other words email sent via May First gets flagged, email sent from other people does not.

These Gmail messages typically warn users about email that has failed (or lacks) both SPF and DKIM. However, before diving into the technical details, my first thought was: why is Gmail giving a warning on a message that wasn’t even delivered to them? It’s always nice to get confirmation from others that this is totally wrong behavior. Unfortunately, when it’s Gmail, it doesn’t matter if they are wrong. We all have to deal with it.

So next, I decided to investigate why this message failed both DKIM (digital signature) and SPF (ensuring the message was sent from an authorized server).

Examining the headers immediately turned up the SPF failure:

Authentication-Results: mx.google.com;
       spf=fail (google.com: domain of xxx@xxx.org does not designate n.n.n.n as permitted sender) smtp.mailfrom=xxx@xxx.org

The IP address Google checked to ensure the message was sent by an authorized server is the IP address of our internal mail filter server in our new email infrastructure. That’s the last hop before delivery to the user’s mailbox, so that’s the last hop Gmail sees. This is why Gmail is totally wrong to run this check: all email messages retreived via their mail fecthing service are going to fail the SPF test because Gmail has no way of knowing what the actual last hop is.

So why is this problem only showing up after we transitioned to our new infrastructure? Because our old infrastructure had only one mail server for every user. The one mail server was the MX server and the relay server, so it was included in their SPF record.

And why does this only affect mail sent via May First and not other domains?

Because we add our DKIM signature to outgoing email, not to email delivered internally. Therefore, these messages both fail the SPF check and also don’t have a DKIM signature. Other messages have a DKIM signature.

Ugggg. So what do we do now? Clearly, something dumb and simple is in order: I added the IP addresses of our internal filter servers to our global SPF record.

Someday, years from now, after Gmail is long gone (or has fixed this dumb behavior), when I’m doing whatever retired people like me do, someone will notice that our internal filter server IPs are included in our SPF record. Hopefully they will fix the problem, but instead they’ll probably think: no idea why these are here - something will probably break if I remove them.