Update Portable Identity for ActivityPub

This commit is contained in:
Shadowfacts 2023-05-01 13:02:40 -04:00
parent 53931c064d
commit 3599de6ce1

View File

@ -1,7 +1,7 @@
```
title = "Portable Identity for ActivityPub"
tags = ["activitypub"]
date = "2023-05-01 12:00:42 -0400"
date = "2023-05-01 13:00:42 -0400"
short_desc = "It is actually possible. And it's not as complicated as you might think."
slug = "activitypub-portable-identity"
```
@ -31,14 +31,10 @@ Notice anything about it? The ID of post tells you where it is. However it doesn
<aside>
An interesting question is why is this the case? Personally, I think the most likely answer is that folks building AP backends have prior web development experience. And outside of decentralized systems, the simplest way of identifying an object from it's URL is using a path parameter. So you get paths that look like `/users/:username/statuses/:status_id`. And then when you need something to use as an AP identifier, you use the whole URL. So dereferencing it ends up being trivial: your server just looks up the object in its database, same as ever. But that's exporting an implementation detail: your pkey means nothing to me. It ties the post to the server where it originated.
An interesting question is why is this the case? Personally, I think the most likely answer is that folks building AP backends have prior web development experience. And outside of decentralized systems, the simplest way of identifying an object from it's URL is using a path parameter. So you get paths that look like `/users/:username/statuses/:status_id`. And then when you need something to use as an AP identifier, you use the whole URL. So dereferencing it ends up being trivial: your server just looks up the object in its database, same as ever. But that's exporting an implementation detail: your primary key (the attribute by which a post is identified in _your_ database) means nothing to me. It unnecessarily ties the post to the server where it originated.
</aside>
It's as if the cannonical URL of this post were `https://5.161.136.163/2023/activitypub-portable-identity/`. All of a sudden, this post isn't controlled by me, the person who holds the domain `shadowfacts.net`, but instead it's controlled by my server host, who controls that IP address. I couldn't change hosting providers without losing all of my content and my social graph[^3] (in this example, the readers of my blog). Sound familiar?
[^3]: Mastodon & co. have the ability to move a social graph using the `Move` activity, which indicates that the actor has moved to a new location. This technique, however, it's not especially reliable. It requires the active participation of _everyone_ in your social graph (or rather, their servers). It foists onto them what should be an implementation detail of your identity.
@ -51,12 +47,14 @@ To make this vision of portable identities to work, there needs to be an additio
For people who own domains, this is easy: they just point the DNS records at their host and tell their host that their identity should be served from their domain. Migrating your identity to another host is then a matter of updating DNS records.
But this shouldn't be accessible only to those who have the resources and technical know-how to administer a domain[^4]. There needs to be some public identity resolver that anyone can use. The simplest way to implement this, I think, is as a service that lets anyone point a subdomain at their host. The identity resolver needs to have a way of mapping any URL back to its host and I think this is easiest (avoiding the need to agree on URL formats) and least resource-intensive (because it's just DNS) method.
But this shouldn't be accessible only to those who have the resources and technical know-how to administer a domain[^4]. There needs to be some public identity resolver that anyone can use. The simplest way to implement this, I think, is as a service (or services, there's no reason this has to be a centralizing force) that lets anyone point a subdomain at their host. The identity resolver needs to have a way of mapping any URL back to its host and I think this is easiest (avoiding the need to agree on URL formats) and least resource-intensive method (because you're not hosting actual content). Having a separate service that's built on top of DNS lets you abstract away the techinical details from people who don't care, and would allow a more straightforward signup experience by providing an API to let people create an identity while signing up for an instance, rather than forcing it to be a separate process.
[^4]: You can argue until the cows come home that everyone _should_ have their own domain name. But, again, that's not the world we live in. If you want to advance the goals of decentralized social networking, rather than concentrating on ideological purity, you have to be pragmatic.
The hard part however, is the social one: we collectively need to agree that the identity resolution layer is _infrastructure_ and not somewhere moderation actions should take place. To use the web analogy again: people publishing objectionable content get kicked off web hosts, but it's far rarer that their domain name is taken away. The same has to be true of any shared identity provider/resolver, otherwise we end up in the exact same situation we're in now.
Adding to the social problem is the risk that vanity URLs become popular, and then the desire shifts to migrating between identity resolvers. Solving this problem is possible, although it ends up requiring a larger architectural change to existing ActivityPub implementations. One possible approach, using DIDs, is discussed below. Another approach is splitting the identity resolver from the apparent username. A username that's written as `@someone@example.com` could be looked up by using WebFinger on `example.com` which would then respond with the actual AP ID of the user as `https://someone.net` which would in turn be hosted by `example.com` (i.e., DNS records for `someone.net` point to `example.com`). This breaks a number of assumptions Mastodon and clients make about the mention format but it is architecturally possible. I do think this is a problem worth solving, but in the interest of supporting account portability in terms of what people usually care about—moderation decisions, servers going down—it isn't a requirement.
## UX
So under this vision, what does the complete migration process actually look like?
@ -91,11 +89,13 @@ Essentially every Activity that you generated on your old host.
Ideally, you'd also want to migrate any relevant activities from other people (think likes/reblogs/replies), in order to preserve the complete context. This can't be done just by transferring the activities themselves, since that could let importers forge activities from other people. So, what's actually transferred would need to be a list of IDs of all the relevant activities. The new instance can then dereference them and add them to its databse. This could be a massive collection, and so this part of the import should probably be throttled and done in the background.
Attachments are another wrinkle, given Mastodon's approach of transcoding everything that's uploaded. The simplest approach, I think, is that the import process shouldn't transcode anything unless it exceeds the usual instance-defined limits. In the common case (migrating from one instance to another using the same software) no transcoding or conversion should be necessary.
#### How do you keep my old instance from impersonating me?
ActivityPub actors already have public/private keypairs and any activities delivered to other servers have to be signed with your private key, which is then validated by the recipients. As part of the migration, the new host can regenerate your keys, so anything the old instance forges and tries to publish as you will fail validation.
#### What about DIDs?
#### Why not DIDs?
ATProto uses [DIDs](https://www.w3.org/TR/did-core/), rather than URIs, for identifiers. DIDs seem interesting, if quite complicated. The requirement of the ActivityPub spec that identifier URIs' authorities belong to "their originating server" does not _seem_ to preclude using DIDs as AP identifiers. The primary advantage DIDs confer is that they let you migrate between not just hosts/PDS's but usernames: he same underlying DID can be updated to refer to `@my.domain` from `@someone.bsky.social`.