Mailinglist Archive URL Redirection from Pipermail to Hyperkitty

2021-03-17, 01:00 en

I recently migrated a mailing list setup running on Mailman 2 to a new Mailman 3-based setup on a new server. The migration itself is pretty straight forward, especially thanks to the official upgrade guide. The migration from Pipermail (the default archiver used in Mailman 2) to Hyperkitty (Mailman 3 archiver) is easy as well, with Hyperkitty providing a command to import the per-list mbox files maintained by Mailman 2.

However there is one thing that will break: Hyperlinks to the old setup can not all be trivially translated to the new URLs. This is why I came up with the following solution to provide a low-effort redirection service that will get most users to the correct new destination.

URLs to Redirect

Let's first have a look at what kind of URLs are used in the old setup so we can figure out where to redirect them.

Before we start, note that all list-specific Mailman 3 URLs include the full address of the list, not just the local part. You need to pay special attention here if you're operating mailing lists for multiple domains.

/cgi-bin/mailman/listinfo shows a list of all publicly visible lists. This can simply be redirected to the Mailman 3 root URL, i.e. /.
/cgi-bin/mailman/listinfo/test is the welcome page of the test mailing list. It shows the list info and the subscription interface. This can trivially be redirected to the Mailman 3 list welcome page, /postorius/lists/test.example.org/.

I chose to simply redirect every "verb" (the listinfo part) to the welcome page, as most, if not all, other verbs are only to be used by list admins, and they should ideally already know the new location.

/pipermail is the archive landing page listing all public mailing list archives. This can simply be redirected to /hyperkitty.
/pipermail/test is the archive overview of the test mailing list. Redirect to /hyperkitty/list/test@example.com.
There is a bunch of per-month overviews, such as /pipermail/test/2021-March/thread.html for a threaded overview of the posts from March 2021, or /pipermail/test/2021-March/author.html for an overview grouped by author. As Hyperkitty only has a monthly overview by thread, I redirect all of these overviews to the same Hyperkitty monthly overview page, e.g. /hyperkitty/list/test@example.org/2021/3

The timestamp in the URL is definitely human readable, however this is the first obstacle that requires a bit more effort. Of course this could be solved with 12 individual rewrites, each for one month, I chose to incorporate the redirection for this in my solution for the next item.

Finally, there are links to individual messages, such as /pipermail/test/2021-March/001337.html. These turned out to be the greatest challenge: Pipermail uses sequential numbers for addressing individual messages, while Hyperkitty uses a hash of the message's Message-Id header, e.g. /hyperkitty/list/test@example.org/message/7HPT35IASDHNIW6MGBDEDR4LENSIR5F4. Unfortunately, there exists no trivial mapping between these two methods, so I wrote a small PHP script to perform the redirection.

Redirecting Individual Message URLs

As mentioned before, Pipermail uses sequential numbers to identify single messages, while Hyperkitty bases its IDs on the Message-Id header. Unfortunately, while generating the archives, Pipermail removes all information that would be required to match the sequential ID to a Message-Id:

The HTML files know about the sequential ID, but not the Message-Id.
The mbox files and textfile archives contain the Message-Id, but not the sequential ID assigned by Pipermail.

And even if they did, I didn't want to keep the old archives around anyway, and only provide redirection based on the URLs. So how can we achieve this?

As it turns out, when importing the mbox file into Hyperkitty, the order in which the messages are inserted into the database backend of Hyperkitty is the same order in which Pipermail assigns the IDs. Of course this only works flawlessly iff all messages in the mbox file were successfully processed by both Pipermail and Hyperkitty. Otherwise a skew is introduced. However, I only encountered this issue a few times, so I chose to simply correct this manually.

I came up with a little PHP script that takes the list name and the sequential ID as arguments, and looks up the (hopefully) matching message ID in the Hyperkitty database. The SQL query used for this simply fetches a single message ID hash from the row that contains the message with the specified offset:

SELECT e.message_id_hash
FROM hyperkitty_email e
LEFT JOIN hyperkitty_mailinglist m
ON e.mailinglist_id = m.id
WHERE m.name = :listid
LIMIT :postid, 1

The script takes some additional parameters for manual offset correction. It additionally takes care of Pipermails 2021-March timestamps by parsing them with the appropriate srtptime formatter. You can find the script along with a sample Apache rewrite configuration on Gitlab.

Apache Rewrite Rules

To perform all the static redirects explained above, as well as invoke the PHP script, I configured Apache 2 somewhat like this:

# Exempt the conversion script from proxying to mailman3-web
ProxyPass /pipermail2hyperkitty.php !
ProxyPass / unix:/run/mailman3-web/uwsgi.sock|uwsgi://localhost/


RewriteEngine on

#
# List info pages get rewritten directly through mod_rewrite
#

# Global list overview page
RewriteRule ^/cgi-bin/mailman/listinfo/?$ /

# Individual list overview pages
# some lists may need special handling, e.g. due to renaming or a non-default domain
RewriteRule ^/cgi-bin/mailman/[^/]+/oldlist(/.*)?$ /postorius/lists/newlist.example.net
# Catch-all for all other lists
RewriteRule ^/cgi-bin/mailman/[^/]+/([^/]+)(/.*)?$ /postorius/lists/$1.example.org

#
# All archive URLs get handed over to pipermail2hyperkitty.php
#

# Special handling for some lists, e.g. renamed lists or non-default domains
RewriteRule "^/pipermail/oldlist/([^/]+)/([^./]+)((\.|/).*)?$" /pipermail2hyperkitty.php?listid=newlist@examle.net&ym=$1&postid=$2 [L,PT,QSD]
RewriteRule "^/pipermail/oldlist/([^./]+)((\.|/).*)?$" /pipermail2hyperkitty.php?listid=newlist@example.net&ym=$1 [L,PT,QSD]
RewriteRule "^/pipermail/oldlist((\.|/).*)?$" /pipermail2hyperkitty.php?listid=newlist@example.net [L,PT,QSD]
# The newlist@example.net archive was merged with the oldlist archives, need to know the db offset (= number of posts in oldlist)
RewriteRule "^/pipermail/newlist/([^/]+)/([^./]+)((\.|/).*)?$" /pipermail2hyperkitty.php?listid=newlist@examle.net&ym=$1&postid=$2&idoffset=1337 [L,PT,QSD]
# The messages 23 through 42 are missing from the test mailinglist archives, skew needs to be compensated
RewriteRule "^/pipermail/test/([^/]+)/([^./]+)((\.|/).*)?$" /pipermail2hyperkitty.php?listid=test@example.org&ym=$1&postid=$2&missing=23_42 [L,PT,QSD]

# Catch-all for all other lists
# Everything containing a list name, timestamp and sequential message ID -> to the single message
RewriteRule "^/pipermail/([^/]+)/([^/]+)/([^./]+)((\.|/).*)?$" /pipermail2hyperkitty.php?listid=$1@example.org&ym=$2&postid=$3 [L,PT,QSD]
# Everything containing a list name and timestamp, but no sequential message ID -> to the month's overview page
RewriteRule "^/pipermail/([^/]+)/([^./]+)((\.|/).*)?$" /pipermail2hyperkitty.php?listid=$1@example.org&ym=$2 [L,PT,QSD]
# Everything containing a list name, but no timestamp or sequential message ID -> to the list's overview page
RewriteRule "^/pipermail/([^/]+)((\.|/).*)?$" /pipermail2hyperkitty.php?listid=$1@example.org [L,PT,QSD]
# Everything else -> to the global archive overview
RewriteRule "^/pipermail((\.|/).*)?$" /hyperkitty [L,PT,QSD]

Conclusion

This is by no means a perfect, bulletproof, or even good solution. And it does not intend to be. I only wrote this script to ease the migration phase for users, and it performs that job well enough.

s3lph made

URLs to Redirect

Redirecting Individual Message URLs

Apache Rewrite Rules

Conclusion