v6/site/posts/2020-07-22-gemini-network-f...

474 lines
29 KiB
Markdown

```
title = "Implement a Gemini Protocol Client Using Network.framework"
tags = ["swift", "gemini"]
date = "2020-07-22 21:57:42 -0400"
slug = "gemini-network-framework"
```
[Gemini](https://gemini.circumlunar.space/) is a small protocol bulit on top of TCP and TLS that's designed to serve as a transport mechanism primarily for text documents while lacking a great deal of the complexity of HTTP. [Network.framework](https://developer.apple.com/documentation/network) was introduced to Apple's platforms in 2018 as a modern framework for dealing with network connections and building custom network protocols. So, let's use it to build a Gemini client implementation.
<!-- excerpt-end -->
## The Protocol
First, an overview of the Gemini protocol. This is going to be fairly brief, as there are some more details that I'm not going to go into, since this post is meant to focus on using Network.framework to build a TCP-based protocol client, rather than the protocol itself[^1]. If you're interested, you can read more about the details of the protocol in its [specification](https://gemini.circumlunar.space/docs/specification.html).
[^1]: That said, the rest of the Gemini protocol, as well as the text format, and the community that's sprung up around it is super interesting, and you should definitely check it out. An easy way to start is by using a Gemini-to-web proxy. Checkout the [homepage](https://proxy.vulpes.one/gemini/gemini.circumlunar.space) and explore from there.
At the highest level, Gemini is fairly similar to HTTP: every connection is made to request a single resource at a specific URL. After the connection is opened, and the TLS handshake completed, the client sends the request. The request is the CRLF-terminated absolute URL of the resource being requested. The URL string is encoded as UTF-8 and has a maximum length of 1024 bytes. The URL scheme doesn't have to be specified, the default is `gemini://` when using the Gemini protocol for transport. The port is also optional, and defaults to `1965`[^2].
[^2]: Because the first [crewed mission](https://en.wikipedia.org/wiki/Gemini_3) of the Gemini Program launched on March 23, 1965.
```plaintext
gemini://example.com:1965/some/resource?foo<CR><LF>
```
Likewise, the response starts with a CRLF-terminated, UTF-8 encoded string. It begins with a two digit status code, where the most significant digit defines the overall response type and the least significant digit provides more specificity. The status code is followed by a space character, then a string up to 1024 bytes in length, and finally the carriage return and line feed characters. The meaning of the meta string in the response is defined by the various status codes (for example, `20` is the status code for success and defines the meta string to be the MIME type of the response body).
```plaintext
20 text/gemini<CR><LF>
```
Finally, if the response was successful (i.e. the server returned status code in the `2x` range), there may be a response body, which is arbitrary binary data.
## The Implementation
With Network.framework, everything starts with an `NWProtocol`. The framework provides a bunch of concrete subclasses for dealing with protocols like TCP, UDP, and TLS. New in 2019 is the `NWProtocolFramer` class which provides an interface for defining your own protocols on top of the builtin stack. Using it starts with an class that conforms to the `NWProtocolFramerImplementation` protocol:
```swift
import Network
class GeminiProtocol: NWProtocolFramerImplementation {
static let label = "Gemini"
required init(framer: NWProtocolFramer.Instance) {}
}
```
The protocol has a bunch of requirements that need to be satisfied. Starting off with the simple ones, it needs a static read-only String variable called label, which will be used in log messages to identify which framer implementation is being used. It also needs an initializer which takes an `NWProtocolFramer.Instance`. Nothing needs to be done in this initializer—the framer instance doesn't even need to be stored, since all of the other methods that have to be implemented directly receive it.
There's also a static `definition` property which stores the `NWProtocolDefinition` that's configured to use this class as the framer's implementation. This needs to be a singleton, not constructed for every request, because it will later be used as a key to get some implementation-specific data out of other framework classes.
```swift
class GeminiProtocol: NWProtocolFramerImplementation {
static let definition = NWProtocolFramer.Definition(implementation: GeminiProtocol.self)
// ...
}
```
Next, there are a few other simple methods to implement:
```swift
class GeminiProtocol: NWProtocolFramerImplementation {
// ...
func start(framer: NWProtocolFramer.Instance) -> NWProtocolFramer.StartResult {
return .ready
}
func wakeup(framer: NWProtocolFramer.Instance) {
}
func stop(framer: NWProtocolFramer.Instance) -> Bool {
return true
}
func cleanup(framer: NWProtocolFramer.Instance) {
}
}
```
Since the Gemini protocol doesn't use long-running/persistent connections, there's no work that needs to be done to start, wakeup, stop, or cleanup an individual connection. And, since each connection only handles a single request, there isn't even any handshake that needs to be performed to start a Gemini connection. We can just send the request and we're off to the races. Similarly, stopping a Gemini connection doesn't mean anything, the connection is just closed.
Actually sending a request is nice and simple. The `NWProtocolFramerImplementation` protocol has a `handleOutput` method (output, in this case, meaning output _from_ the client, i.e., the request). This method receives an instance of the protocol's message type, which in this case is `NWProtocolFramer.Message`. Since `NWProtocolFramer` is designed to be used to implement application-level protocols, its message type functions as a key-value store that can contain arbitrary application protocol information.
For the Gemini protocol, a simple struct encapsulates all the data we need to make a request. All it does is ensure that the URL is no longer than 1024 bytes upon initialization (a limit defined by the protocol spec) and define a small helper property that creates a `Data` object containg the URL string encoded as UTF-8 with the carriage return and line feed characters appended.
```swift
struct GeminiRequest {
let url: URL
init(url: URL) throws {
guard url.absoluteString.utf8.count <= 1024 else { throw Error.urlTooLong }
self.url = url
}
var data: Data {
var data = url.absoluteString.data(using: .utf8)!
data.append(contentsOf: [13, 10]) // <CR><LF>
return data
}
enum Error: Swift.Error {
case urlTooLong
}
}
```
Also, a simple extension on `NWProtocolFramer.Message` provides access to the stored `GeminiRequest`, instead of dealing with string keys directly. There's also a convenience initializer to create a message instance from a request that's set up to use the protocol definition from earlier.
```swift
private let requestKey = "gemini_request"
extension NWProtocolFramer.Message {
convenience init(geminiRequest request: GeminiRequest) {
self.init(definition: GeminiProtocol.definition)
self[requestKey] = request
}
var geminiRequest: GeminiRequest? {
self[requestKey] as? GeminiRequest
}
}
```
With those both in place, the protocol implementation can simply grab the request out of the message and send its data through to the framer instance:
```swift
class GeminiProtocol: NWProtocolFramerImplementation {
// ...
func handleOutput(framer: NWProtocolFramer.Instance, message: NWProtocolFramer.Message, messageLength: Int, isComplete: Bool) {
guard let request = message.geminiRequest else {
fatalError("GeminiProtocol can't send message that doesn't have an associated GeminiRequest")
}
framer.writeOutput(data: request.data)
}
}
```
Parsing input (i.e., the response from the server) is somewhat more complicated. Parsing the status code and the meta string will both follow a similar pattern. The `parseInput` method of `NWProtocolFramer.Instance` is used to get some input from the connection, given a valid range of lengths for the input. This method also takes a closure, which receives an optional `UnsafeMutableRawBufferPointer` containing the input data that was received as well as a boolean flag indicating if the connection has closed. It returns an integer representing the number of bytes that it consumed (meaning data that was fully parsed and should not be provided on subsequent `parseInput` calls). This closure is responsible for parsing the data, storing the result in a local variable, and returning how much, if any, of the data was consumed.
First off is the status code (and the following space character). In the protocol implementation, there's a optional `Int` property used as temporary storage for the status code. If the `tempStatusCode` property is `nil`, the `parseInput` method is called on the framer. The length is always going to be 3 bytes (1 for each character of the status code, and 1 for the space). Inside the `parseInput` closure, if the buffer is not present or it's not of the expected length, the closure returns zero to indicate that no bytes were consumed. Otherwise, the contents of the buffer are converted to a String and then parsed into an integer[^3] and stored in the temporary property (this is okay because the closure passed to `parseInput` is non-escaping, meaning it will be called before `parseInput` returns). Finally, the closure returns `3` to indicate that three bytes were consumed and should not be provided again as input.
[^3]: If you were really building an implementation of the Gemini protocol, you would probably want to wrap the raw integer status code in something else to avoid dealing with magic numbers throughout your codebase. An enum backed by integer values, perhaps.
Outside the `if`, there's a `guard` that checks that there is a status code present, either from immediately prior or potentially from a previous invocation of the method. If not, it returns `3` from the `handleInput` method, telling the framework that that it expects there to be at least 3 bytes available before it's called again. The reason the status code is stored in a class property, and why the code ensures that it's `nil` before trying to parse, is so that if some subsequent parse step fails and the method returns and has to be invoked again in the future, it doesn't try to re-parse the status code because the actual data for it has already been consumed.
```swift
class GeminiProtocol: NWProtocolFramerImplementation {
// ...
private var tempStatusCode: Int?
func handleInput(framer: NWProtocolFramer.Instance) -> Int {
if tempStatusCode == nil {
_ = framer.parseInput(minimumIncompleteLength: 3, maximumLength: 3) { (buffer, isComplete) -> Int in
guard let buffer = buffer, buffer.count == 3 else { return 0 }
let secondIndex = buffer.index(after: buffer.startIndex)
if let str = String(bytes: buffer[...secondIndex], encoding: .utf8),
let value = Int(str, radix: 10) {
self.tempStatusCode = value
}
return 3
}
}
guard let statusCode = tempStatusCode else {
return 3
}
}
}
```
Next up: the meta string. Following the same pattern as with the status code, there's a temporary property to store the result of parsing the meta string and a call to `parseInput`. This time, the minimum length is 2 bytes (since the Gemini spec doesn't specify a minimum length for the meta string, it could be omitted entirely, which would leave just two bytes for the carriage return and line feed) and the maximum length is 1026 bytes (up to 1024 bytes for the meta string, and again, the trailing CRLF).
This time, the closure once again validates that there is enough data to at least attempt to parse it, but then it loops through the data looking for the CRLF sequence which defines the end of the meta string[^4]. Afterwards, if the marker sequence was not found, the closure returns zero because no data was consumed. Otherwise, it constructs a string from the bytes up to the index of the carriage return, stores it in the temporary property, and returns the number of bytes consumed (`index` here represents the end index of the string, so without the additional `+ 2` the trailing CRLF would be considered part of the body). After the call to `parseInput`, it similarly checks that the meta was parsed successfully and returns if not.
[^4]: You can't scan through the data backwards, because the response body immediately follows the CRLF after the meta string, so you could end up finding a CRLF sequence inside the body and incorrectly basing the length of the meta string off that.
One key difference between parsing the meta string and parsing the status code is that if the status code couldn't be parsed, the exact number of bytes that must be available before it can be attempted again is always the same: 3. That's not true when parsing the meta text: the number of bytes necessary for a retry is depedent on the number of bytes that were unsuccessfully attempted to be parsed. For that reason, there's also an optional `Int` variable which stores the length of the buffer that the closure attempted to parse. When the closure executes, the variable is set to the length of the buffer. If, inside the closure, the code fails to find the carriage return and line feed characters anywhere, one of two things happens: If the buffer is shorter than 1026 bytes, the closure returns zero to indicate that nothing was consumed. Then, since there's no string, the `handleInput` will return 1 plus the attempted meta length, indicating to the framework that it should wait until there is at least 1 additional byte of data available before calling `handleInput` again. If no CRLF was found, and the buffer count is greater than or equal to 1026, the closure simply aborts with a `fatalError` because the protocol specifies that the cannot be longer than 1024 bytes (it would be better to set some sort of 'invalid' flag on the response object and then pass that along to be handled by higher-level code, but for the purposes of this blog post, that's not interesting code). In the final case, if parsing the meta failed and the `attemptedMetaLength` variable is `nil`, that means there wasn't enough data available, so we simply return 2.
**Update July 7, 2021:** The eagle-eyed among you may notice that there's a flaw in the following implementation involving what happens when meta parsing has to be retried. I discovered this myself and discussed it in [this follow-up post](/2021/gemini-client-debugging/).
```swift
class GeminiProtocol: NWProtocolFramerImplementation {
// ...
private var tempMeta: String?
func handleInput(framer: NWProtocolFramer.Instance) -> Int {
// ...
var attemptedMetaLength: Int?
if tempMeta == nil {
_ = framer.parseInput(minimumIncompleteLength: 2, maximumLength: 1026) { (buffer, isComplete) -> Int in
guard let buffer = buffer, buffer.count >= 2 else { return 0 }
attemptedMetaLength = buffer.count
let lastPossibleCRIndex = buffer.index(before: buffer.index(before: buffer.endIndex))
var index = buffer.startIndex
var found = false
while index <= lastPossibleCRIndex {
if buffer[index] == 13 /* CR */ && buffer[buffer.index(after: index)] == 10 /* LF */ {
found = true
break
}
index = buffer.index(after: index)
}
if !found {
if buffer.count < 1026 {
return 0
} else {
fatalError("Expected to find <CR><LF> in buffer. Meta string may not be longer than 1024 bytes.")
}
}
tempMeta = String(bytes: buffer[..<index], encoding: .utf8)
return buffer.startIndex.distance(to: index) + 2
}
}
guard didParseMeta, let meta = tempMeta else {
if let attempted = attemptedMetaLength {
return attempted + 1
} else {
return 2
}
}
}
}
```
With the entire header parsed, an object can be constructed to represent the response metadata and an `NWProtocolFramer.Message` created to contain it.
```swift
class GeminiProtocol: NWProtocolFramerImplementation {
// ...
func handleInput(framer: NWProtocolFramer.Instance) -> Int {
// ...
let header = GeminiResponseHeader(status: statusCode, meta: meta)
let message = NWProtocolFramer.Message(geminiResponseHeader: header)
}
}
```
`GeminiResponseHeader` is a simple struct to contain the status code and the meta string in a type-safe manner:
```swift
struct GeminiResponseHeader {
let status: Int
let meta: String
}
```
As with the request object, there's a small extension on `NWProtocolFramer.Message` so that all the string keys are contained to a single place.
```swift
private let responseHeaderKey = "gemini_response_header"
extension NWProtocolFramer.Message {
convenience init(geminiResponseHeader header: GeminiResponseHeader) {
self.init(definition: GeminiProtocol.definition)
self[responseHeaderKey] = header
}
var geminiResponseHeader: GeminiResponseHeader? {
self[responseHeaderKey] as? GeminiResponseHeader
}
}
```
To actually pass the message off to the client of the protocol implementation, the `deliverInputNoCopy` method is used. Since the `handleInput` method has already parsed all of the data it needs to, and the response body is defined by the protocol to just be the rest of the response data, the `deliverInputNoCopy` method is a useful way of passing the data straight through to the protocol client, avoiding an extra memory copy. If the protocol had to transform the body of the response somehow, it could be read as above and then delivered to the protocol client with the `deliverInput(data:message:isComplete:)` method.
If the request was successful (i.e., the status code was in the 2x range), we try to receive as many bytes as possible, because the protocol doesn't specify a way of determining the length of a response. All other response codes are defined to never have response bodies, so we don't need to deliver any data. Using `.max` is a little bit weird, since we don't actually _need_ to receive that many bytes. But it seems to work perfectly fine in practice: once all the input is received and the other side closes the connection, the input is delivered without error.
Annoyingly, the return value of the Swift function is entirely undocumented (even in the generated headers, where the parameters are). Fortunately, the C equivalent (`nw_framer_deliver_input_no_copy`) is more thoroughly documented and provides an answer: the function returns a boolean indicating whether the input was delivered immediately or whether the framework will wait for more bytes before delivering it. We don't care at all about this, so we just discard the return value.
Finally, we return 0 from `handleInput`. Ordinarily, this would mean that there must be zero or more bytes available before the framework calls us again. But, because we've delivered all the available input, that will never happen.
```swift
class GeminiProtocol: NWProtocolFramerImplementation {
// ...
func handleInput(framer: NWProtocolFramer.Instance) -> Int {
// ...
_ = framer.deliverInputNoCopy(length: statsCode.isSuccess ? .max : 0, message: message, isComplete: true)
return 0
}
}
```
Actually using the Gemini protocol implementation will require creating an `NWConnection` object, which takes an endpoint and connection parameters. The parameters define which protocols to use and the various options for them. The `NWParameters` class already defines a number of static `NWParameters` variables for commonly used protocols, so adding our own for Gemini fits right in.
```swift
extension NWParameters {
static var gemini: NWParameters {
let tcpOptions = NWProtocolTCP.Options()
let parameters = NWParameters(tls: geminiTLSOptions, tcp: tcpOptions)
let geminiOptions = NWProtocolFramer.Options(definition: GeminiProtocol.definition)
parameters.defaultProtocolStack.applicationProtocols.insert(geminiOptions, at: 0)
return parameters
}
private static var geminiTLSOptions: NWProtocolTLS.Options {
let options = NWProtocolTLS.Options()
sec_protocol_options_set_min_tls_protocol_version(options.securityProtocolOptions, .TLSv12)
return options
}
}
```
Here the only thing we customize about the TLS options is setting the minimum required version to TLS 1.2, as required by the Gemini spec. However, the Gemini spec further recommnds that clients implement a trust-on-first-use scheme to alllow people to host content on the Gemini network using self-signed certificates, but implementing that is out of the scope of this post. If you're interested, a good starting point is the `sec_protocol_options_set_verify_block` function which lets you provide a closure that the framework uses to verify server certificates during the TLS handshake process.
Now, to make an API for all this that's actually pleasant to use, I pretty closely followed the `URLSessionDataTask` approach from Foundation, since it models somthing fairly similar to Gemini.
`GeminiDataTask` is a class which will store the request being sent, a completion handler, as well as an internal state and the underlying `NWConnection`. The initializer stores a few things, and then sets up the network connection. It uses the URL port, if it has one, otherwise the default of 1965. The host is simply the host of the requested URL. These are used to construct an `NWEndpoint` object and, combined with the Gemini `NWParameters` setup previously, create the connection. The convenience initializer also provides a slightly nicer API, so the user doesn't have to directly deal with the `GeminiRequest` object (which, from their perspective, is useless since there's nothing to customize about it beyond the plain old URL).
```swift
class GeminiDataTask {
typealias Completion = (Result<GeminiResponse, Error>) -> Void
let request: GeminiRequest
private let completion: Completion
private(set) var state: State
private let connection: NWConnection
init(request: GeminiRequest, completion: @escaping Completion) {
self.request = request
self.completion = completion
self.state = .unstarted
let port = request.url.port != nil ? UInt16(request.url.port!) : 1965
let endpoint = NWEndpoint.hostPort(host: NWEndpoint.Host(request.url.host!), port: NWEndpoint.Port(rawValue: port)!)
self.connection = NWConnection(to: endpoint, using: .gemini)
}
convenience init(url: URL, completion: @escaping Completion) throws {
self.init(request: try GeminiRequest(url: url), completion: completion)
}
}
```
The `State` enum is quite simple, just a few cases. It isn't used for much, just keeping track of the internal state so that the task doesn't try to perform any invalid operations on the connection.
```swift
extension GeminiDataTask {
enum State {
case unstarted, started, completed
}
}
```
There's also a small helper struct to combine the response body and metadata into a single object:
```swift
struct GeminiResponse {
let header: GeminiResponseHeader
let body: Data?
var status: Int { header.status }
var meta: String { header.meta }
}
```
There are also some small methods to start and stop the request. I also copied the behavior from `URLSessionTask` where the task is automatically cancelled when all references to it are released.
```swift
class GeminiDataTask {
// ...
deinit {
self.cancel()
}
func resume() {
guard self.state == .unstarted else { return }
self.connection.start(queue: GeminiDataTask.queue)
self.state = .started
}
func cancel() {
guard state != .completed else { return }
self.connection.cancel()
self.state = .completed
}
}
```
When the connection starts, it needs to know which `DispatchQueue` to call its handler blocks on. For simplicity, here there's just a single queue used for all Gemini tasks.
```swift
class GeminiDataTask {
static let queue = DispatchQueue(label: "GeminiDataTask", qos: .default)
// ...
}
```
Also in the initializer, the `stateUpdateHandler` property of the connection is set to a closure which receives the connection's new state. If the connection has become ready, it sends the request. If the connection has errored for some reason, it ensures that it's closed and reports the error to the task's completion handler.
```swift
class GeminiDataTask {
// ...
init(request: GeminiRequest, completion: @escaping Completion) {
// ...
self.connection.stateUpdateHandler = { (newState) in
switch newState {
case .ready:
self.sendRequest()
case let .failed(error):
self.state = .completed
self.connection.cancel()
self.completion(.failure(error))
default:
break
}
}
}
}
```
To actually send the request, an `NWProtocoFramer.Message` is constructed for the request using the convenience initializer added earlier. Then, a custom connection context is instantiated, using the message as its metadata. The message isn't sent directly, so the connection context is how `NWProtocolFramer` will later get access to it. There's no data sent because Gemini requests can't have any body and the only data required is already encoded by the `GeminiRequest` object. Since the spec states that every connection corresponds to exactly one request, the request is completed immediately. The only thing the send completion handler needs to do is check if an error occurred while sending the request, and if so, cancel the connection and report the error.
```swift
class GeminiDataTask {
// ...
private func sendRequest() {
let message = NWProtocoFramer.Message(geminiRequest: self.request)
let context = NWConnection.ContentContext(identifier: "GeminiRequest", metadata: [message])
self.connection.send(content: nil, contentContext: context, isComplete: true, completion: .contentProcessed({ (_) in
if let error = error {
self.state = .completed
self.connection.cancel()
self.completion(.failure(error))
}
}))
self.receive()
}
}
```
Once the request has been sent, the `receive` method is called on the task to setup the receive handler for the connection. The receive closure takes the data that was received, another content context, whether the request is completed, and any error that may have occurred. In all cases, it closes the connection and sets the task's internal state to completed. If there was an error, it's reported via the task's completion handler. As when sending the request, the `NWConnection` has no direct knowledge of the `NWProtocolFramer` and its messages, so those have to pulled out via the context. If the message and header were found, then the header is bundled up with the rest of the data that was received into a response object which is given to the completion handler.
```swift
class GeminiDataTask {
// ...
private func receive() {
self.connection.receiveMessage { (data, context, isComplete, error) in
if let error = error {
self.completion(.failure(error))
} else if let message = context?.protocolMetadata(definition: GeminiProtocol.definition) as? NWProtocoFramer.Message,
let header = message.geminiResponseHeader {
let response = GeminiResponse(header: header, body: data)
self.completion(.success(response))
}
self.connection.cancel()
self.state = .completed
}
}
}
```
To recap, here's how it all fits together: First, the user constructs a `GeminiDataTask` representing the request. Next, to kickoff the request, the user calls the `resume` method on it. This starts the underlying `NWConnection` which establishes a TCP connection and performs the TLS handshake. Once the network connection is ready, its `stateUpdateHandler` closure is notified, causing the `sendRequest` method to be called on the task. That method then creates the actual message object, gives it to the connection to send, and then sets up a handler to be called when a response is received. Using the request message and the `GeminiProtocol` implementation, `Network.framework` gets the raw bytes to send over the network. The framework then waits in the background to receive a respone from the server. Once data is received from the server and has been decrypted, it returns to the `GeminiProtocol` which parses the metadata and then sends the rest of the data on to the protocol client. Upon receipt of the full metadata and message, the receive closure is called. The closure then passes the result of the request—either an error or the Gemini response—to the completion handler and closes the connection.
At the end of all this, the API we've got is a nice simple abstraction over a network protocol that should be fairly familiar to most Apple-platform developers:
```swift
let task = GeminiDataTask(url: URL(string: "gemini://gemini.circumlunar.space/")!) { (result)
print("Status: \(result.status)")
print("Meta: '\(result.meta)'")
if let data = result.data, let str = String(data: data, encoding: .utf8) {
print(str)
}
}
task.resume()
```
Network.framework is a super is useful tool for writing custom networking code and building abstractions on top of relatively low level protocols. The example I gave here isn't a hypothetical, I'm using Network.framework and almost this exact code to build a [Gemini browser](https://git.shadowfacts.net/shadowfacts/Gemini) app for Mac and iOS.
This post has barely scratched the surface, there's even more interesting stuff the framework is capable of, such as building peer-to-peer protocols. The documentation, in particular the [Tic-Tac-Toe sample project](https://developer.apple.com/documentation/network/building_a_custom_peer-to-peer_protocol) is great resource for seeing more of what's possible.