POSTFIX LOGGING HOWTO by /dev/rob0, first draft posting, 2017-07-13 (Comments and additions are solicited, please send them to rob0-www@nodns4.us, thank you.) This is intended to be a short document to help a Postfix mail administrator select the proper log entries to get help with issues on the postfix-users mailing list or in a similar forum. The target audience are experienced administrators of their own Unix or Unix-like operating system, but it is also hoped that this will help the less experienced administrators to become more competent and confident. Generally what you need to do is to show all non-verbose logging from Postfix relating to the handling of a single email (or connection) which didn't do what you wanted it to do. Postfix design is modular, and each part separately logs what it does. By default all of these logs are sent to the "mail" facility of your syslogd. (The reader is expected to understand Unix syslog(3) and the working configuration of his/her own syslog daemon. See Wikipedia for general information, and the documentation of your own OS or distro for specifics.) Due to this modularity, grep(1) is usually the wrong tool to use in dealing with logs, even on very busy sites. Very important note: you need to be looking at all the mail.* syslog output: at all syslog priority levels. Debian GNU/Linux for example "helpfully" splits off various priorities into separate files. Those other files DO NOT HELP you. You need to see logging of all priorities in real time order. SIMPLE EXAMPLE Here is an example of a normal inbound delivery on my system, this example being a Gmail user posting to a Mailman list: Jul 9 04:51:42 harrier postfix/postscreen[692]: CONNECT from [74.125.83.41]:36191 to [207.223.116.211]:25 Jul 9 04:51:42 harrier postfix/dnsblog[693]: addr 74.125.83.41 listed by domain list.dnswl.org as 127.0.5.0 Jul 9 04:51:42 harrier postfix/dnsblog[696]: addr 74.125.83.41 listed by domain dnsbl.sorbs.net as 127.0.0.6 Jul 9 04:51:45 harrier postfix/postscreen[692]: PASS NEW [74.125.83.41]:36191 Jul 9 04:51:46 harrier postfix/smtpd[846]: connect from mail-pg0-f41.google.com[74.125.83.41] Jul 9 04:51:46 harrier postfix/smtpd[846]: Anonymous TLS connection established from mail-pg0-f41.google.com[74.125.83.41]: TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits) Jul 9 04:51:46 harrier postfix/smtpd[846]: NOQUEUE: permit: RCPT from mail-pg0-f41.google.com[74.125.83.41]: action=permit_dnswl_client for Client host=mail-pg0-f41.google.com[74.125.83.41] ; from= to= proto=ESMTP helo= Jul 9 04:51:46 harrier postfix/smtpd[846]: 3x4wtB5r2yz5vMc: client=mail-pg0-f41.google.com[74.125.83.41] Jul 9 04:51:46 harrier postfix/smtpd[846]: 3x4wtB5r2yz5vMc: warn: DATA from mail-pg0-f41.google.com[74.125.83.41]: ; from= to= proto=ESMTP helo= Jul 9 04:51:47 harrier postfix/cleanup[847]: 3x4wtB5r2yz5vMc: message-id=<8c279a57-d92d-6b05-ff15-435407c9a445@gmail.com> Jul 9 04:51:47 harrier postfix/qmgr[19739]: 3x4wtB5r2yz5vMc: from=, size=4723, nrcpt=1 (queue active) Jul 9 04:51:47 harrier postfix/smtpd[846]: disconnect from mail-pg0-f41.google.com[74.125.83.41] ehlo=2 starttls=1 mail=1 rcpt=1 data=1 quit=1 commands=7 Jul 9 04:51:47 harrier postfix/local[848]: 3x4wtB5r2yz5vMc: to=, orig_to=, relay=local, delay=0.82, delays=0.57/0.01/0/0.24, dsn=2.0.0, status=sent (delivered to command: /usr/local/mailman/mail/mailman post slackbuilds-users) Jul 9 04:51:47 harrier postfix/qmgr[19739]: 3x4wtB5r2yz5vMc: removed What to grep for? The client IP address, "[ \[]74\.125\.83\.41[ \]]", gets most lines, but not the ones from cleanup(8), qmgr(8), and local(8). Also, it will get any other connections from the same member of Google's outbound farm. Note as well that you have to use an ugly expression to catch both the smtpd(8) and the dnsblog(8) logs. The queue ID, 3x4wtB5r2yz5vMc, skips postscreen(8) and dnsblog entirely, as well as any from smtpd prior to the assignment of the queue ID. The configuration from which this example was generated includes three non-standard postconf(5) settings: smtpd_data_restrictions = check_recipient_access,static:warn smtpd_log_access_permit_actions = static:all smtpd_tls_loglevel = 1 The latter causes the "Anonymous TLS connection" line; the middle is responsible for the "NOQUEUE: permit:" line; and the first causes the "warn: DATA from" line. Only the last one of those log lines includes the queue ID. Also the final smtpd disconnect and summary line does not include the queue ID. The only expression which reliably matches all these log lines is "harrier postfix/", and that's going to match ALL Postfix logging. So grep is out. Try instead with a pager like less(1). You have a similar regular expression search feature with less, invoked with the "/" (forward slash) key by default. And a simpler expression can find the start of the logs you want: /\[74\.125\.83\.41\] Yes, this might find non-relevant results from the same IP address, but not in this case, as you can see from the "PASS NEW" message from postscreen. That means we have not yet seen this client, at least within the past 30 days. In many cases you'll have the queue ID and search for that, but then you can either scroll back to get the postscreen/dnsblog lines, or search reverse to other instances of the bracketed IP address. NOT-AS-SIMPLE EXAMPLE That was a simple example because all the relevant lines were contiguous and together. In the real world and especially on busier sites than my little server, it's not so simple; you have many other log lines interspersed to distract you. In the following example, PRVH0018.presinet.ne.jp[202.218.252.19] is a botnet attacker trying to brute-force SMTP AUTH credentials, but for sake of example, let's assume it's a legitimate user, unable to submit mail. The submission arrives in the midst of a malware attack being handled by postscreen. Jul 9 05:29:56 harrier postfix/submission/smtpd[2891]: connect from PRVH0018.presinet.ne.jp[202.218.252.19] Jul 9 05:29:57 harrier postfix/postscreen[2878]: PREGREET 14 after 0.43 from [222.221.122.230]:63292: EHLO ylmf-pc\r\n Jul 9 05:29:57 harrier postfix/postscreen[2878]: DNSBL rank 3 for [222.221.122.230]:63292 Jul 9 05:29:57 harrier postfix/postscreen[2878]: HANGUP after 0.96 from [222.221.122.230]:63228 in tests after SMTP handshake Jul 9 05:29:57 harrier postfix/postscreen[2878]: DISCONNECT [222.221.122.230]:63228 Jul 9 05:29:57 harrier postfix/submission/smtpd[2891]: Anonymous TLS connection established from prvh0018.presinet.ne.jp[202.218.252.19]: TLSv1.2 with ciph er ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits) Jul 9 05:29:58 harrier postfix/postscreen[2878]: CONNECT from [222.221.122.230]:63374 to [207.223.116.211]:25 Jul 9 05:29:58 harrier postfix/dnsblog[2879]: addr 222.221.122.230 listed by domain zen.spamhaus.org as 127.0.0.11 Jul 9 05:29:58 harrier postfix/postscreen[2878]: HANGUP after 0.87 from [222.221.122.230]:63292 in tests after SMTP handshake Jul 9 05:29:58 harrier postfix/postscreen[2878]: DISCONNECT [222.221.122.230]:63292 Jul 9 05:29:58 harrier postfix/postscreen[2878]: HANGUP after 1.9 from [222.221.122.230]:63212 in tests after SMTP handshake Jul 9 05:29:58 harrier postfix/postscreen[2878]: DISCONNECT [222.221.122.230]:63212 Jul 9 05:29:58 harrier postfix/postscreen[2878]: PREGREET 14 after 0.49 from [222.221.122.230]:63374: EHLO ylmf-pc\r\n Jul 9 05:29:58 harrier postfix/postscreen[2878]: CONNECT from [222.221.122.230]:63428 to [207.223.116.211]:25 Jul 9 05:29:58 harrier postfix/postscreen[2878]: CONNECT from [222.221.122.230]:63440 to [207.223.116.211]:25 Jul 9 05:29:58 harrier postfix/postscreen[2878]: DNSBL rank 3 for [222.221.122.230]:63374 --- skip 5 seconds, :59 through :03 inclusive, and LOTS of lines Jul 9 05:30:04 harrier postfix/postscreen[2878]: CONNECT from [222.221.122.230]:63846 to [207.223.116.211]:25 Jul 9 05:30:04 harrier postfix/dnsblog[2879]: addr 222.221.122.230 listed by domain zen.spamhaus.org as 127.0.0.11 Jul 9 05:30:04 harrier postfix/submission/smtpd[2891]: warning: PRVH0018.presinet.ne.jp[202.218.252.19]: SASL PLAIN authentication failed: Jul 9 05:30:04 harrier postfix/submission/smtpd[2891]: disconnect from PRVH0018.presinet.ne.jp[202.218.252.19] ehlo=2 starttls=1 auth=0/1 quit=1 commands=4/5 Jul 9 05:30:04 harrier postfix/postscreen[2878]: PREGREET 14 after 0.48 from [222.221.122.230]:63846: EHLO ylmf-pc\r\n Jul 9 05:30:04 harrier postfix/postscreen[2878]: DNSBL rank 3 for [222.221.122.230]:63846 Here there are two search strings which can help us find what we seek: first, "postfix/submission/smtpd", and second, the PID of that process, "\[2891\]". Fortunately less will highlight all search matches on the visible part of the file on your screen. So a search for either of those strings will highlight all of them. In this case a "grep -v postfix/postscreen" could have been done prior to the pager, to eliminate all the logs here which are known to be irrelevant. So it's more fair to say, "grep should not be your first choice of tool," rather than, "never use grep." We don't see anything there from Dovecot SASL. I don't know why, and since it's a botnet I don't really care, but if it had been a real user I would know that the user has not configured his/her MUA correctly. I would refer the user again to the client setup documentation. EVEN WORSE EXAMPLE [ this example is not done yet ] This example will be taken from someone else's logs, thank God, because it will be an issue with a compromised server spewing spam, and with after-queue content filtering causing two queue IDs per unique email. We will discuss in this example the idea of a chained command: grep | less so thereby we can eliminate much of the logging we know is not of interest in our search. BOTTOM LINE: WHAT TO DO WHEN ASKING FOR HELP If you're not paying someone for consultancy, do not expect nor ask helpers to sift through hundreds of lines of logs. It should be condensed down to a single email or connection which demonstrates the issue at hand. For an email: 1. Find the queue ID in your logs, using a pager. 2. Scroll (or search) back in logs to connection initiation: postscreen or smtpd. 3. Select all lines related to the connection and delivery [attempt]. 4. Paste all of that, usually less than 20 lines, into your email or online pastebin. For a problem with an inbound connection: 1. Find the client IP address initial connection in your logs, using a pager. 2. Scroll (or search) forward in logs to disconnection. 3. Select all lines related to the connection. 4. Paste all of that, usually less than 10 lines, into your email or online pastebin. For a problem with an outbound connection: 1. Find the queue ID in your logs, using a pager. 2. Scroll (or search) back in logs to connection initiation: postscreen or smtpd, or perhaps pickup(8). 3. Select all lines related to it, up to and including the FIRST failed delivery attempt. 4. Paste all of that, usually less than 20 lines, into your email or online pastebin. WHAT NOT TO DO: Other than what was already mentioned, DO NOT set verbose logging. If you're having trouble understanding what was in non-verbose logs, face it: you will only have hundreds more lines for each email or connection to fail to understand. And if you were understanding what you saw in your logs, chances are, you would not have needed to ask for help. DO NOT include IMAP, POP3, or other non-Postfix logging in seeking help for Postfix issues. By default these other services also log to the mail syslog facility. Yes, in isolated cases there will be items of possible interest from these other services, but as a rule of thumb you should assume that they are not relevant. DO NOT make widespread assumptions about what your helper[s] might or might not want to see. In many of those cases the assumptions are wrong! Logging which looks "normal" can provide important clues to those who are more experienced with Postfix. (Maybe it is normal, but it's not doing what you wanted it to do.) OTHER RESOURCES: Postfix Debugging Howto: http://www.postfix.org/DEBUG_README.html How to ask questions the smart way: http://www.catb.org/esr/faqs/smart-questions.html http://sweet.nodns4.us/