Some thoughts on archiving mail for SOX / legal requirements



Kevin K wrote:

	> Currently I use an open-source program called mairix which indexes bodies
	> of e-mail for searching on sender, recipient, date, subject, body, etc.
	> It works but isn't scaling well to the volume of mail we handle.

I think a different (better) solution is to reduce the number of messages that you are archiving. Do you archive all your mail? If so, do you really need to? If you can define the users who need to have their mail archived then you can add glue to only archive those messages.

One way to do this checking would be to have sendmail do the checking. Create a hash database with a list of addresses that need archiving. Test both the sender and the recipient address against the database in the check_mail and check_rcpt rulesets. If a match is found, set a macro which is then passed to a Milter that would add an archive Bcc: address if the macro had been set. MIMEDefang would be a good choice.

You could also do the same thing entirely within MIMEDefang. Use the same hash key database. Check the envelope sender and envelope recipient against the database in the filter_sender and filter_recipient routines. If a match is found, add an archive Bcc: address.

Another issue you brought up was capturing Bcc: addresses. First off in terms of SMTP there is no such thing as a "Bcc:" address. The Bcc: header is a way for a sender to add additional envelope recipient addresses to a message that do not show up in the message itself. It is normally used by the user's MUA (Mail User Agent). When the MUA passes the message to the MTA (Mail Transport Agent) it will pass all the recipients in the To:, Cc: and Bcc: headers as envelope recipient addresses, but while it will generate To: and Cc: headers, it does not generate a Bcc: header so these recipients do not show up in the message itself.

Now capturing Bcc: information is tricky because the envelope recipient list changes as the message is processed and delivered. You can derive Bcc: addresses by comparing the list of envelope recipients to the list of header recipients (To: and Cc: addresses). Recipients who show up in the envelope but not the header can be assumed to be Bcc: addresses. But this is rather simplistic because of aliasing and forwarding.

If one of the header addresses is an alias and the alias is expanded before the checking is done, then all of the addresses pointed to in the alias will show up as Bcc: addresses. Off the top of my head I do not know if Milter calls smfi_envrcpt before or after aliasing. David, do you know?

A second issue is forwarding. Is the message forwarded to an internal relay that splits internal and external mail. I.e. internal mail to the mailbox server and external mail to the firewall SMTP relay. If you are trying to capture the Bcc: addresses on the firewall SMTP relay, you will miss any internal Bcc: addresses. So the most reliable place to capture the Bcc: addresses is on the first SMTP host that receives the message.

BTW, you can also derive Bcc: addresses from the syslog files if you also log the To: and Cc: headers. This could be done with a pair of header specific rulesets for To: and Cc: headers and the sendmail's syslog database.

Now what to do with these Bcc: addresses? You could have MIMEDefang log them. You could also have MIMEDefang generate an X-Bcc: header and list them in the message header. Simply adding an X-Bcc: header may not be the best idea since it would expose the Bcc: addresses to all of the recipients, not just the archive address.

The best way to do this would be to make it a conditional header (H?F?X-Bcc: or H?{Macro}?X-Bcc:). I don't think H?{Macro}?X-Bcc: would work since Milter does not have any mechanism to pass a macro back to sendmail (is this correct?) I also do not know if Milter allows a header to be returned with ?F?X-Bcc: conditional mailer flag syntax. If it were allowed, you could then use an unused mailer flag and then define a custom mailer that sets that flag. you would then deliver mail to the achieve address with this mailer using the mailertable.

Not sure if this would work or not, but these are my thoughts.

Hope this helps

RLH