v6/site/posts/2023-05-01-activitypub-port...

109 lines
14 KiB
Markdown
Raw Normal View History

```
title = "Portable Identity for ActivityPub"
tags = ["activitypub"]
date = "2023-05-01 13:00:42 -0400"
short_desc = "It is actually possible. And it's not as complicated as you might think."
slug = "activitypub-portable-identity"
```
Bluesky has been making waves recently, both on its own merits and how it contrasts implementation-wise with ActivityPub/the fedivese. There are a bunch of ways it differs from other implementations of decentralized social media, but there's one in particular I want to focus on: portable identity. The idea that I should be able to take all of my data, everything that my identity consists of, and move wholesale to a new instance/server/PDS/whatever you call it. Bluesky permits this[^1]. Mastodon and most (all?) other ActivityPub implementations do not. But that doesn't have to be the case, nothing about ActivityPub—architecturally speaking—is incompatible with this vision.
[^1]: In principle, at least. As of writing this, there are no other PDS's that one could move to.
<!-- excerpt-end -->
One of the more common feature requests for Mastodon (third only, I'd guess, to quote posts and search) is the ability to migrate accounts _with their posts_. Without this ability, people get trapped on the first instance they try[^2]. Maybe they'd prefer to be on a different instance, closer to a particular community, or somewhere with different moderation standards. But all their posts are stuck right where they are, and it ends up taking a huge, calamitous event—like an instance shutting down—to overcome that inertia and actually move.
[^2]: You can argue that actually social media is or should be ephemeral and people shouldn't be so attached to their posts. And you might be right. But they are, so arguing about it or trying to convince them otherwise is a waste of time.
So what's standing in the way, and why aren't posts portable already? Well, here's an (abriged) example of the ActivityPub representation of a Mastodon post:
```json
{
"@context": ["https://www.w3.org/ns/activitystreams"],
"id": "https://mastodon.social/users/shadowfacts/statuses/15287",
"type": "Note",
"content": "<p>Hello, world!</p>"
}
```
Notice anything about it? The ID of post tells you where it is. However it doesn't just identify where the document can be found; it also identifies where it's _hosted_. This property is true of all object identifiers in Mastodon (and just about every other serivce that implements ActivityPub).
<aside>
An interesting question is why is this the case? Personally, I think the most likely answer is that folks building AP backends have prior web development experience. And outside of decentralized systems, the simplest way of identifying an object from it's URL is using a path parameter. So you get paths that look like `/users/:username/statuses/:status_id`. And then when you need something to use as an AP identifier, you use the whole URL. So dereferencing it ends up being trivial: your server just looks up the object in its database, same as ever. But that's exporting an implementation detail: your primary key (the attribute by which a post is identified in _your_ database) means nothing to me. It unnecessarily ties the post to the server where it originated.
</aside>
It's as if the cannonical URL of this post were `https://5.161.136.163/2023/activitypub-portable-identity/`. All of a sudden, this post isn't controlled by me, the person who holds the domain `shadowfacts.net`, but instead it's controlled by my server host, who controls that IP address. I couldn't change hosting providers without losing all of my content and my social graph[^3] (in this example, the readers of my blog). Sound familiar?
[^3]: Mastodon & co. have the ability to move a social graph using the `Move` activity, which indicates that the actor has moved to a new location. This technique, however, it's not especially reliable. It requires the active participation of _everyone_ in your social graph (or rather, their servers). It foists onto them what should be an implementation detail of your identity.
It doesn't have to be like this. My posts don't belong to the service I'm using to host them. They belong to _me_. The identity of the posts I published should be tied to me and not to the service I'm using to host them. That is, the IDs need to be resolvable through something (hint: a domain) that _I_ control before ultimately reaching my host.
Nothing about the ActivityPub spec, as I read it, prohibits this. §3.1 says that object identifiers need to be publicly dereferencable. It notes that their authority (i.e., domain and port) has to belong to the originating server—but there's nothing to suggest that "server" is distinct from a domain, or that the server-to-host mapping can't be many-to-one.
To make this vision of portable identities to work, there needs to be an additional layer that performs the mapping of identities to hosts.
For people who own domains, this is easy: they just point the DNS records at their host and tell their host that their identity should be served from their domain. Migrating your identity to another host is then a matter of updating DNS records.
But this shouldn't be accessible only to those who have the resources and technical know-how to administer a domain[^4]. There needs to be some public identity resolver that anyone can use. The simplest way to implement this, I think, is as a service (or services, there's no reason this has to be a centralizing force) that lets anyone point a subdomain at their host. The identity resolver needs to have a way of mapping any URL back to its host and I think this is easiest (avoiding the need to agree on URL formats) and least resource-intensive method (because you're not hosting actual content). Having a separate service that's built on top of DNS lets you abstract away the techinical details from people who don't care, and would allow a more straightforward signup experience by providing an API to let people create an identity while signing up for an instance, rather than forcing it to be a separate process.
[^4]: You can argue until the cows come home that everyone _should_ have their own domain name. But, again, that's not the world we live in. If you want to advance the goals of decentralized social networking, rather than concentrating on ideological purity, you have to be pragmatic.
The hard part however, is the social one: we collectively need to agree that the identity resolution layer is _infrastructure_ and not somewhere moderation actions should take place. To use the web analogy again: people publishing objectionable content get kicked off web hosts, but it's far rarer that their domain name is taken away. The same has to be true of any shared identity provider/resolver, otherwise we end up in the exact same situation we're in now.
Adding to the social problem is the risk that vanity URLs become popular, and then the desire shifts to migrating between identity resolvers. Solving this problem is possible, although it ends up requiring a larger architectural change to existing ActivityPub implementations. One possible approach, using DIDs, is discussed below. Another approach is splitting the identity resolver from the apparent username. A username that's written as `@someone@example.com` could be looked up by using WebFinger on `example.com` which would then respond with the actual AP ID of the user as `https://someone.net` which would in turn be hosted by `example.com` (i.e., DNS records for `someone.net` point to `example.com`). This breaks a number of assumptions Mastodon and clients make about the mention format but it is architecturally possible. I do think this is a problem worth solving, but in the interest of supporting account portability in terms of what people usually care about—moderation decisions, servers going down—it isn't a requirement.
## UX
So under this vision, what does the complete migration process actually look like?
1. You export an archive of your account from your current instance.
2. You import that archive into your new account.
3. You point your identity at the new server (either by changing your DNS records, or by updating your information in the resolver's directory described before).
In some ways, that's actually simpler than the current migration process. You don't have to remember to make your new account point back at your old account before initiating the migration. You don't have to deal with the distinction between data that's part of the direct server-to-server migration (your followers) and what's not (who you follow). You don't have to retry the migration process because some of your followers' servers missed the notice and didn't follow your new account. From everyone else's point of view, the migration just seamlessly happens.
## Getting There
If you look at Mastodon issue [#12432](https://github.com/mastodon/mastodon/issues/12423) "Support Post Migration", you will find an incredibly long thread spanning three and a half years of discussion. The discussion primarily comes down to the fact that Mastodon's architectural decisions are not conducive to this approach.
As noted before, all of Mastodon's AP identifiers are scoped to the host, not to the user's identity. Transferring posts to another server would mean _changing_ those identifiers and breaking every existing reference to a post—both on the old instance and, more complicatedly, every other instance in the network that's aware of them. The former is possible, if potentially resource-intensive. But the latter is simply not: unless an instance has authorized fetch (off by default) enabled from day 0 and tracks every instance that requests every post (Mastodon does not), there is no way to know who has a copy of a post that's moved. After an account migration, there will be stale links. As a result, the discussion around adding post migration has centered on moving posts and just accepting that links will be broken. Maybe just having the posts there on your new account is enough?
I am not familiar enough with the Mastodon codebase to say whether moving to the vision I've outlined is feasible. If it's not, I think Mastodon should continue to pursue alternate methods—if only because of the sheer number of users and the absolute clarity that this is a feature people want. But I think it's apparent that the approach I've outlined here would be a more complete and transparent migration process.
There is, to the best of my knowledge, only one single ActivityPub project that supports multiple domains: [Takahē](https://github.com/jointakahe/takahe). Multiple accounts across different domains being backed by the same host doesn't get us all the way to portable identity. But the architectural decisions required to support it go a long way towards that vision. I have not taken the time to trawl through the code and work out if it's actually using the domain to look up AP objects in its database or if it, like Mastodon and others, is still just extracting the database ID from a path component and using that for the lookup. Either way, by virtue of supporting multiple domains already, I think Takahē is much closer to reaching this vision.
### Technical Miscellanea
What follows isn't anything cohesive, just some thoughts that occurred while writing this post and thinking about the architecture I've described.
#### So what's the primary key in my database?
Well, it can still be whatever you want. And the ActivityPub identifier URL can still be derived therefrom. The key is just that looking up an object uses the _entire_ URL, treating it as an opaque identifier rather than trying to parse it and pull out pieces.
#### What data actually gets migrated?
Essentially every Activity that you generated on your old host.
Ideally, you'd also want to migrate any relevant activities from other people (think likes/reblogs/replies), in order to preserve the complete context. This can't be done just by transferring the activities themselves, since that could let importers forge activities from other people. So, what's actually transferred would need to be a list of IDs of all the relevant activities. The new instance can then dereference them and add them to its databse. This could be a massive collection, and so this part of the import should probably be throttled and done in the background.
Attachments are another wrinkle, given Mastodon's approach of transcoding everything that's uploaded. The simplest approach, I think, is that the import process shouldn't transcode anything unless it exceeds the usual instance-defined limits. In the common case (migrating from one instance to another using the same software) no transcoding or conversion should be necessary.
#### How do you keep my old instance from impersonating me?
ActivityPub actors already have public/private keypairs and any activities delivered to other servers have to be signed with your private key, which is then validated by the recipients. As part of the migration, the new host can regenerate your keys, so anything the old instance forges and tries to publish as you will fail validation.
#### Why not DIDs?
ATProto uses [DIDs](https://www.w3.org/TR/did-core/), rather than URIs, for identifiers. DIDs seem interesting, if quite complicated. The requirement of the ActivityPub spec that identifier URIs' authorities belong to "their originating server" does not _seem_ to preclude using DIDs as AP identifiers. The primary advantage DIDs confer is that they let you migrate between not just hosts/PDS's but usernames: he same underlying DID can be updated to refer to `@my.domain` from `@someone.bsky.social`.
<aside>
"Doesn't this just move the centralization point from identity resolver to the DID resolver?" you may ask. Yes, I think it does, but that matters a lot less if your DID is `did:plc:<giant random string>`. And moreover, the DID resolution process is deliberately left unspecified, but the spec does consider the possibility of multiple resolvers.
</aside>
This does solve the caveat mentioned earlier, that the shared identity resolver has to be treated as infrastructure and be above moderation decisions. But, if the goal is to move the existing ecosystem towards portable identity in a reasonably expendient manner—and I believe that is the goal—adopting DIDs in the short term is unnecessary.