v6/site/posts/2023-05-01-activitypub-portable-identity.md

17 KiB

title = "Portable Identity for ActivityPub"
tags = ["activitypub"]
date = "2023-05-01 13:00:42 -0400"
short_desc = "It is actually possible. And it's not as complicated as you might think."
slug = "activitypub-portable-identity"

Bluesky has been making waves recently, both on its own merits and how it contrasts implementation-wise with ActivityPub/the fedivese. There are a bunch of ways it differs from other implementations of decentralized social media, but there's one in particular I want to focus on: portable identity. The idea that I should be able to take all of my data, everything that my identity consists of, and move wholesale to a new instance/server/PDS/whatever you call it. Bluesky permits this1. Mastodon and most (all?) other ActivityPub implementations do not. But that doesn't have to be the case, nothing about ActivityPub—architecturally speaking—is incompatible with this vision.

One of the more common feature requests for Mastodon (third only, I'd guess, to quote posts and search) is the ability to migrate accounts with their posts. Without this ability, people get trapped on the first instance they try2. Maybe they'd prefer to be on a different instance, closer to a particular community, or somewhere with different moderation standards. But all their posts are stuck right where they are, and it ends up taking a huge, calamitous event—like an instance shutting down—to overcome that inertia and actually move.

So what's standing in the way, and why aren't posts portable already? Well, here's an (abriged) example of the ActivityPub representation of a Mastodon post:

{
  "@context": ["https://www.w3.org/ns/activitystreams"],
  "id": "https://mastodon.social/users/shadowfacts/statuses/15287",
  "type": "Note",
  "content": "<p>Hello, world!</p>"
}

Notice anything about it? The ID of post tells you where it is. However it doesn't just identify where the document can be found; it also identifies where it's hosted. This property is true of all object identifiers in Mastodon (and just about every other serivce that implements ActivityPub).

It's as if the cannonical URL of this post were https://5.161.136.163/2023/activitypub-portable-identity/. All of a sudden, this post isn't controlled by me, the person who holds the domain shadowfacts.net, but instead it's controlled by my server host, who controls that IP address. I couldn't change hosting providers without losing all of my content and my social graph3 (in this example, the readers of my blog). Sound familiar?

It doesn't have to be like this. My posts don't belong to the service I'm using to host them. They belong to me. The identity of the posts I published should be tied to me and not to the service I'm using to host them. That is, the IDs need to be resolvable through something (hint: a domain) that I control before ultimately reaching my host.

Nothing about the ActivityPub spec, as I read it, prohibits this. §3.1 says that object identifiers need to be publicly dereferencable. It notes that their authority (i.e., domain and port) has to belong to the originating server—but there's nothing to suggest that "server" is distinct from a domain, or that the server-to-host mapping can't be many-to-one.

To make this vision of portable identities to work, there needs to be an additional layer that performs the mapping of identities to hosts.

For people who own domains, this is easy: they just point the DNS records at their host and tell their host that their identity should be served from their domain. Migrating your identity to another host is then a matter of updating DNS records.

But this shouldn't be accessible only to those who have the resources and technical know-how to administer a domain4. There needs to be some public identity resolver that anyone can use. The simplest way to implement this, I think, is as a service (or services, there's no reason this has to be a centralizing force) that lets anyone point a subdomain at their host. The identity resolver needs to have a way of mapping any URL back to its host and I think this is easiest (avoiding the need to agree on URL formats) and least resource-intensive method (because you're not hosting actual content). Having a separate service that's built on top of DNS lets you abstract away the techinical details from people who don't care, and would allow a more straightforward signup experience by providing an API to let people create an identity while signing up for an instance, rather than forcing it to be a separate process.

The hard part however, is the social one: we collectively need to agree that the identity resolution layer is infrastructure and not somewhere moderation actions should take place. To use the web analogy again: people publishing objectionable content get kicked off web hosts, but it's far rarer that their domain name is taken away. The same has to be true of any shared identity provider/resolver, otherwise we end up in the exact same situation we're in now.

Adding to the social problem is the risk that vanity URLs become popular, and then the desire shifts to migrating between identity resolvers. Solving this problem is possible, although it ends up requiring a larger architectural change to existing ActivityPub implementations. One possible approach, using DIDs, is discussed below. Another approach is splitting the identity resolver from the apparent username. A username that's written as @someone@example.com could be looked up by using WebFinger on example.com which would then respond with the actual AP ID of the user as https://someone.net which would in turn be hosted by example.com (i.e., DNS records for someone.net point to example.com). This breaks a number of assumptions Mastodon and clients make about the mention format but it is architecturally possible. I do think this is a problem worth solving, but in the interest of supporting account portability in terms of what people usually care about—moderation decisions, servers going down—it isn't a requirement.

The key advantage of this approach is that resolving a portable identity or object ID into a concrete ActivityPub object is no different than it is now. You just make an HTTP request, and get back an AP object. Where it's actually hosted, and therefore changing the host, does not matter to you. All your references to the migrated people and posts do not change.

UX

So under this vision, what does the complete migration process actually look like?

  1. You export an archive of your account from your current instance.
  2. You import that archive into your new account.
  3. You point your identity at the new server (either by changing your DNS records, or by updating your information in the resolver's directory described before).

In some ways, that's actually simpler than the current migration process. You don't have to remember to make your new account point back at your old account before initiating the migration. You don't have to deal with the distinction between data that's part of the direct server-to-server migration (your followers) and what's not (who you follow). You don't have to retry the migration process because some of your followers' servers missed the notice and didn't follow your new account. From everyone else's point of view, the migration just seamlessly happens.

Getting There

If you look at Mastodon issue #12432 "Support Post Migration", you will find an incredibly long thread spanning three and a half years of discussion. The discussion primarily comes down to the fact that Mastodon's architectural decisions are not conducive to this approach.

As noted before, all of Mastodon's AP identifiers are scoped to the host, not to the user's identity. Transferring posts to another server would mean changing those identifiers and breaking every existing reference to a post—both on the old instance and, more complicatedly, every other instance in the network that's aware of them. The former is possible, if potentially resource-intensive. But the latter is simply not: unless an instance has authorized fetch (off by default) enabled from day 0 and tracks every instance that requests every post (Mastodon does not), there is no way to know who has a copy of a post that's moved. After an account migration, there will be stale links. As a result, the discussion around adding post migration has centered on moving posts and just accepting that links will be broken. Maybe just having the posts there on your new account is enough?

I am not familiar enough with the Mastodon codebase to say whether moving to the vision I've outlined is feasible. If it's not, I think Mastodon should continue to pursue alternate methods—if only because of the sheer number of users and the absolute clarity that this is a feature people want. But I think it's apparent that the approach I've outlined here would be a more complete and transparent migration process.

There is, to the best of my knowledge, only one single ActivityPub project that supports multiple domains: Takahē. Multiple accounts across different domains being backed by the same host doesn't get us all the way to portable identity. But the architectural decisions required to support it go a long way towards that vision. I have not taken the time to trawl through the code and work out if it's actually using the domain to look up AP objects in its database or if it, like Mastodon and others, is still just extracting the database ID from a path component and using that for the lookup. Either way, by virtue of supporting multiple domains already, I think Takahē is much closer to reaching this vision.

Update: There is an AP implementation, Bovine, which stores and identifies AP objects by their AP identifier, which goes a long way towards making this implementation of portable identity possible.

Migrating existing accounts and content to portable identities, though, is an open and much harder question. The identifier for just about every AP object that's already out there specifies the host. Making those objects portable would mean either making existing domains identity resolvers rather than hosts (not feasible) or getting every instance to update all of its references to the moved objects (basically where we're at now, and it's not feasible either). This is the biggest question regarding this approach, and I readily admit that I do not have a good answer to it.

Technical Miscellanea

What follows isn't anything cohesive, just some thoughts that occurred while writing this post and thinking about the architecture I've described.

So what's the primary key in my database?

Well, it can still be whatever you want. And the ActivityPub identifier URL can still be derived therefrom. The key is just that looking up an object uses the entire URL, treating it as an opaque identifier rather than trying to parse it and pull out pieces.

What data actually gets migrated?

Essentially every Activity that you generated on your old host.

Ideally, you'd also want to migrate any relevant activities from other people (think likes/reblogs/replies), in order to preserve the complete context. This can't be done just by transferring the activities themselves, since that could let importers forge activities from other people. So, what's actually transferred would need to be a list of IDs of all the relevant activities. The new instance can then dereference them and add them to its databse. This could be a massive collection, and so this part of the import should probably be throttled and done in the background.

Attachments are another wrinkle, given Mastodon's approach of transcoding everything that's uploaded. The simplest approach, I think, is that the import process shouldn't transcode anything unless it exceeds the usual instance-defined limits. In the common case (migrating from one instance to another using the same software) no transcoding or conversion should be necessary.

How do you keep my old instance from impersonating me?

ActivityPub actors already have public/private keypairs and any activities delivered to other servers have to be signed with your private key, which is then validated by the recipients. As part of the migration, the new host can regenerate your keys, so anything the old instance forges and tries to publish as you will fail validation.

Why not DIDs?

ATProto uses DIDs, rather than URIs, for identifiers. DIDs seem interesting, if quite complicated. The requirement of the ActivityPub spec that identifier URIs' authorities belong to "their originating server" does not seem to preclude using DIDs as AP identifiers. The primary advantage DIDs confer is that they let you migrate between not just hosts/PDS's but usernames: he same underlying DID can be updated to refer to @my.domain from @someone.bsky.social.

This does solve the caveat mentioned earlier, that the shared identity resolver has to be treated as infrastructure and be above moderation decisions. But, if the goal is to move the existing ecosystem towards portable identity in a reasonably expendient manner—and I believe that is the goal—adopting DIDs in the short term is unnecessary.

Moderating Actions Against Hosts

A very good point brought up in reply to this post was that since, right now, a domain/host/instance are all one and the same, they serve as a very useful target for moderation actions, but portable identity seems to interfere with that. If the moderators of a certain instance condone bad behavior from one person, another instance can take action against that entire instance, rather than just the individual, on the reasonable assumption the moderators will permit similar behavior from other people. But adding the layer of indirection I described makes it much harder to take such actions. Where as now it's clear that @alice@example.com and @bob@example.com are hosted at the same place, if they used their own domains—say, @alice@alices.place and @bob@bob.online—it's no longer self-evident that they're hosted, and thus moderated, at the same place.

First off, I think that portable identity inherently makes this sort of moderation against the host less useful. Making it easier for everyone to up and move hosts definitionally makes it easier for bad actors to do the same. Taking moderation action against example.com has less utility if it's easy for everyone hosted there to relocate.

But, that said, I think this model is still possible—and for one method we need only look to email, where identities (email addresses) are already separate from hosts (SMTP servers). Hosts could require that inbound activities specify the host that they originate from, not just the actor, and then reject activities from blocked domains. To prevent the originating host from being spoofed, the host would sign the message with a private key and then the recipient would then validate the signature against the originating host's public key looked up from DNS.


  1. In principle, at least. As of writing this, there are no other PDS's that one could move to. ↩︎

  2. You can argue that actually social media is or should be ephemeral and people shouldn't be so attached to their posts. And you might be right. But they are, so arguing about it or trying to convince them otherwise is a waste of time. ↩︎

  3. Mastodon & co. have the ability to move a social graph using the Move activity, which indicates that the actor has moved to a new location. This technique, however, it's not especially reliable. It requires the active participation of everyone in your social graph (or rather, their servers). It foists onto them what should be an implementation detail of your identity. ↩︎

  4. You can argue until the cows come home that everyone should have their own domain name. But, again, that's not the world we live in. If you want to advance the goals of decentralized social networking, rather than concentrating on ideological purity, you have to be pragmatic. ↩︎