Mailinglist Archive URL Redirection from Pipermail to Hyperkittyen
I recently migrated a mailing list setup running on Mailman 2 to a new Mailman 3-based setup on a new server. The migration itself is pretty straight forward, especially thanks to the official upgrade guide. The migration from Pipermail (the default archiver used in Mailman 2) to Hyperkitty (Mailman 3 archiver) is easy as well, with Hyperkitty providing a command to import the per-list mbox files maintained by Mailman 2.
However there is one thing that will break: Hyperlinks to the old setup can not all be trivially translated to the new URLs. This is why I came up with the following solution to provide a low-effort redirection service that will get most users to the correct new destination.
URLs to Redirect
Let's first have a look at what kind of URLs are used in the old setup so we can figure out where to redirect them.
Before we start, note that all list-specific Mailman 3 URLs include the full address of the list, not just the local part. You need to pay special attention here if you're operating mailing lists for multiple domains.
/cgi-bin/mailman/listinfoshows a list of all publicly visible lists. This can simply be redirected to the Mailman 3 root URL, i.e.
/cgi-bin/mailman/listinfo/testis the welcome page of the
testmailing list. It shows the list info and the subscription interface. This can trivially be redirected to the Mailman 3 list welcome page,
I chose to simply redirect every "verb" (the
listinfo part) to the
welcome page, as most, if not all, other verbs are only to be used
by list admins, and they should ideally already know the new
/pipermailis the archive landing page listing all public mailing list archives. This can simply be redirected to
/pipermail/testis the archive overview of the
testmailing list. Redirect to
There is a bunch of per-month overviews, such as
/pipermail/test/2021-March/thread.htmlfor a threaded overview of the posts from March 2021, or
/pipermail/test/2021-March/author.htmlfor an overview grouped by author. As Hyperkitty only has a monthly overview by thread, I redirect all of these overviews to the same Hyperkitty monthly overview page, e.g.
The timestamp in the URL is definitely human readable, however this is the first obstacle that requires a bit more effort. Of course this could be solved with 12 individual rewrites, each for one month, I chose to incorporate the redirection for this in my solution for the next item.
- Finally, there are links to individual messages, such as
/pipermail/test/2021-March/001337.html. These turned out to be the greatest challenge: Pipermail uses sequential numbers for addressing individual messages, while Hyperkitty uses a hash of the message's
/firstname.lastname@example.org/message/7HPT35IASDHNIW6MGBDEDR4LENSIR5F4. Unfortunately, there exists no trivial mapping between these two methods, so I wrote a small PHP script to perform the redirection.
Redirecting Individual Message URLs
As mentioned before, Pipermail uses sequential numbers to identify single messages, while Hyperkitty bases its IDs on the Message-Id header. Unfortunately, while generating the archives, Pipermail removes all information that would be required to match the sequential ID to a Message-Id:
- The HTML files know about the sequential ID, but not the Message-Id.
- The mbox files and textfile archives contain the Message-Id, but not the sequential ID assigned by Pipermail.
And even if they did, I didn't want to keep the old archives around anyway, and only provide redirection based on the URLs. So how can we achieve this?
As it turns out, when importing the mbox file into Hyperkitty, the order in which the messages are inserted into the database backend of Hyperkitty is the same order in which Pipermail assigns the IDs. Of course this only works flawlessly iff all messages in the mbox file were successfully processed by both Pipermail and Hyperkitty. Otherwise a skew is introduced. However, I only encountered this issue a few times, so I chose to simply correct this manually.
I came up with a little PHP script that takes the list name and the sequential ID as arguments, and looks up the (hopefully) matching message ID in the Hyperkitty database. The SQL query used for this simply fetches a single message ID hash from the row that contains the message with the specified offset:
SELECT e.message_id_hash FROM hyperkitty_email e LEFT JOIN hyperkitty_mailinglist m ON e.mailinglist_id = m.id WHERE m.name = :listid LIMIT :postid, 1
The script takes some additional parameters for manual offset
correction. It additionally takes care of Pipermails
timestamps by parsing them with the appropriate
formatter. You can find the script along with a sample Apache rewrite
configuration on Gitlab.
Apache Rewrite Rules
To perform all the static redirects explained above, as well as invoke the PHP script, I configured Apache 2 somewhat like this:
# Exempt the conversion script from proxying to mailman3-web ProxyPass /pipermail2hyperkitty.php ! ProxyPass / unix:/run/mailman3-web/uwsgi.sock|uwsgi://localhost/ RewriteEngine on # # List info pages get rewritten directly through mod_rewrite # # Global list overview page RewriteRule ^/cgi-bin/mailman/listinfo/?$ / # Individual list overview pages # some lists may need special handling, e.g. due to renaming or a non-default domain RewriteRule ^/cgi-bin/mailman/[^/]+/oldlist(/.*)?$ /postorius/lists/newlist.example.net # Catch-all for all other lists RewriteRule ^/cgi-bin/mailman/[^/]+/([^/]+)(/.*)?$ /postorius/lists/$1.example.org # # All archive URLs get handed over to pipermail2hyperkitty.php # # Special handling for some lists, e.g. renamed lists or non-default domains RewriteRule "^/pipermail/oldlist/([^/]+)/([^./]+)((\.|/).*)?$" /email@example.com&ym=$1&postid=$2 [L,PT,QSD] RewriteRule "^/pipermail/oldlist/([^./]+)((\.|/).*)?$" /firstname.lastname@example.org&ym=$1 [L,PT,QSD] RewriteRule "^/pipermail/oldlist((\.|/).*)?$" /email@example.com [L,PT,QSD] # The firstname.lastname@example.org archive was merged with the oldlist archives, need to know the db offset (= number of posts in oldlist) RewriteRule "^/pipermail/newlist/([^/]+)/([^./]+)((\.|/).*)?$" /email@example.com&ym=$1&postid=$2&idoffset=1337 [L,PT,QSD] # The messages 23 through 42 are missing from the test mailinglist archives, skew needs to be compensated RewriteRule "^/pipermail/test/([^/]+)/([^./]+)((\.|/).*)?$" /firstname.lastname@example.org&ym=$1&postid=$2&missing=23_42 [L,PT,QSD] # Catch-all for all other lists # Everything containing a list name, timestamp and sequential message ID -> to the single message RewriteRule "^/pipermail/([^/]+)/([^/]+)/([^./]+)((\.|/).*)?$" /email@example.com&ym=$2&postid=$3 [L,PT,QSD] # Everything containing a list name and timestamp, but no sequential message ID -> to the month's overview page RewriteRule "^/pipermail/([^/]+)/([^./]+)((\.|/).*)?$" /firstname.lastname@example.org&ym=$2 [L,PT,QSD] # Everything containing a list name, but no timestamp or sequential message ID -> to the list's overview page RewriteRule "^/pipermail/([^/]+)((\.|/).*)?$" /email@example.com [L,PT,QSD] # Everything else -> to the global archive overview RewriteRule "^/pipermail((\.|/).*)?$" /hyperkitty [L,PT,QSD]
This is by no means a perfect, bulletproof, or even good solution. And it does not intend to be. I only wrote this script to ease the migration phase for users, and it performs that job well enough.