shadowfacts.net/site/posts/2020-07-22-gemini-network-f...

29 KiB

metadata.title = "Implement a Gemini Protocol Client Using Network.framework"
metadata.tags = ["swift", "gemini"]
metadata.date = "2020-07-22 21:57:42 -0400"
metadata.shortDesc = ""
metadata.slug = "gemini-network-framework"

Gemini is a small protocol bulit on top of TCP and TLS that's designed to serve as a transport mechanism primarily for text documents while lacking a great deal of the complexity of HTTP. Network.framework was introduced to Apple's platforms in 2018 as a modern framework for dealing with network connections and building custom network protocols. So, let's use it to build a Gemini client implementation.

The Protocol

First, an overview of the Gemini protocol. This is going to be fairly brief, as there are some more details that I'm not going to go into, since this post is meant to focus on using Network.framework to build a TCP-based protocol client, rather than the protocol itself^[That said, the rest of the Gemini protocol, as well as the text format, and the community that's sprung up around it is super interesting, and you should definitely check it out. An easy way to start is by using a Gemini-to-web proxy. Checkout the homepage and explore from there.]. If you're interested, you can read more about the details of the protocol in its specification.

At the highest level, Gemini is fairly similar to HTTP: every connection is made to request a singel resource at a specific URL. After the connection is opened, and the TLS handshake completed, the client sends the request. The request is the CRLF-terminated absolute URL of the resource being requested. The URL string is encoded as UTF-8 and has a maximum length of 1024 bytes. The URL scheme doesn't have to be specified, the default is gemini:// when using the Gemini protocol for transport. The port is also optional, and defaults to 1965^[Because the first crewed mission of the Gemini Program launched on March 23, 1965.].

gemini://example.com:1965/some/resource?foo<CR><LF>

Likewise, the response starts with a CRLF-termianted, UTF-8 encoded string. It begins with a two digit status code, where the most significant digit defines the overall response type and the least significant digit provides more specificity. The status code is followed by a space character, then a string up to 1024 bytes in length, and finally the carriage return and line feed characters. The meaning of the meta string in the response is defined by the various status codes (for example, 20 is the status code for success and defines the meta string to be the MIME type of the response body).

20 text/gemini<CR><LF>

Finally, if the response was successful (i.e. the server returned status code in the 2x range), there may be a response body, which is arbitrary binary data.

The Implementation

With Network.framework, everything starts with an NWProtocol. The framework provides a bunch of concrete subclasses for dealing with protocols like TCP, UDP, and TLS. New in 2019 is the NWProtocolFramer class which provides an interface for defining your own protocols on top of the builtin stack. Using it starts with an class that conforms to the NWProtocolFramerImplementation protocol:

import Network

class GeminiProtocol: NWProtocolFramerImplementation {
	static let label = "Gemini"

	required init(framer: NWProtocolFramer.Instance) {}
}

The protocol has a bunch of requirements that need to be satisfied. Starting off with the simple ones, it needs a static read-only String variable called label, which will be used in log messages to identify which framer implementation is being used. It also needs an initializer which takes an NWProtocolFramer.Instance. Nothing needs to be done in this initializer—the framer instance doesn't even need to be stored, since all of the other methods that have to be implemented directly receive it.

There's also a static definition property which stores the NWProtocolDefinition that's configured to use this class as the framer's implementation. This needs to be a singleton, not constructed for every request, because it will later be used as a key to get some implementation-specific data out of other framework classes.

class GeminiProtocol: NWProtocolFramerImplementation {
	static let definition = NWProtocolFramer.Definition(implementation: GeminiProtocol.self)
	// ...
}

Next, there are a few other simple methods to implement:

class GeminiProtocol: NWProtocolFramerImplementation {
	// ...
	func start(framer: NWProtocolFramer.Instance) -> NWProtocolFramer.StartResult {
		return .ready
	}

	func wakeup(framer: NWProtocolFramer.Instance) {
	}

	func stop(framer: NWProtocolFramer.Instance) -> Bool {
		return true
	}

	func cleanup(framer: NWProtocolFramer.Instance) {
	}
}

Since the Gemini protocol doesn't use long-running/persistent connections, there's no work that needs to be done to start, wakeup, stop, or cleanup an individual connection. And, since each connection only handles a single request, there isn't even any handshake that needs to be performed to start a Gemini connection. We can just send the request and we're off to the races. Similarly, stopping a Gemini connection doesn't mean anything, the connection is just closed.

Actually sending a request is nice and simple. The NWProtocolFramerImplementation protocol has a handleOutput method (output, in this case, meaning output from the client, i.e., the request). This method receives an instance of the protocol's message type, which in this case is NWProtocolFramer.Message. Since NWProtocolFramer is designed to be used to implement application-level protocols, its message type functions as a key-value store that can contain arbitrary application protocol information.

For the Gemini protocol, a simple struct encapsulates all the data we need to make a request. All it does is ensure that the URL is no longer than 1024 bytes upon initialization (a limit defined by the protocol spec) and define a small helper property that creates a Data object containg the URL string encoded as UTF-8 with the carriage return and line feed characters appended.

struct GeminiRequest {
	let url: URL

	init(url: URL) throws {
		guard url.absoluteString.utf8.count <= 1024 else { throw Error.urlTooLong }
		self.url = url
	}

	var data: Data {
		var data = url.absoluteString.data(using: .utf8)!
		data.append(contentsOf: [13, 10]) // <CR><LF>
		return data
	}

	enum Error: Swift.Error {
		caes urlTooLong
	}
}

Also, a simple extension on NWProtocolFramer.Message provides access to the stored GeminiRequest, instead of dealing with string keys directly. There's also a conveniece initializer to create a message instance from a request that's set up to use the protocol definition from earlier.

private let requestKey = "gemini_request"

extension NWProtocolFramer.Message {
	convenience init(geminiRequest request: GeminiRequest) {
		self.init(definition: GeminiProtocol.definition)
		self[requestKey] = request
	}

	var geminiRequest: GeminiRequest? {
		self[requestKey] as? GeminiRequest
	}
}

With those both in place, the protocol implementation can simply grab the request out of the message and send its data through to the framer instance:

class GeminiProtocol: NWProtocolFramerImplementation {
	// ...
	func handleOutput(framer: NWProtocolFramer.Instance, message: NWProtocolFramer.Message, messageLength: Int, isComplete: Bool) {
		guard let request = message.geminiRequest else {
			fatalError("GeminiProtocol can't send message that doesn't have an associated GeminiRequest")
		}
		framer.writeOutput(data: request.data)
	}
}

Parsing input (i.e., the response from the server) is somewhat more complicated. Parsing the status code and the meta string will both follow a similar pattern. The parseInput method of NWProtocolFramer.Instance is used to get some input from the connection, given a valid range of lengths for the input. This method also takes a closure, which receives an optional UnsafeMutableRawBufferPointer containing the input data that was received as well as a boolean flag indicating if the connection has closed. It returns an integer representing the number of bytes that it consumed (meaning data that was fully parsed and should not be provided on subsequent parseInput calls). This closure is responsible for parsing the data, storing the result in a local variable, and returning how much, if any, of the data was consumed.

First off is the status code (and the following space character). In the protocol implementation, there's a optional Int property used as temporary storage for the status code. If the tempStatusCode property is nil, the parseInput method is called on the framer. The length is always going to be 3 bytes (1 for each character of the status code, and 1 for the space). Inside the parseInput closure, if the buffer is not present or it's not of the expected length, the closure returns zero to indicate that no bytes were consumed. Otherwise, the contents of the buffer are converted to a String and then parsed into an integer^[If you were really building an implementation of the Gemini protocol, you would probably want to wrap the raw integer status code in something else to avoid dealing with magic numbers throughout your codebase. An enum backed by integer values, perhaps.] and stored in the temporary property (this is okay because the closure passed to parseInput is non-escaping, meaning it will be called before parseInput returns). Finally, the closure returns 3 to indicate that three bytes were consumed and should not be provided again as input.

Outside the if, there's a guard that checks that there is a status code present, either from immediately prior or potentially from a previous invocation of the method. If not, it returns 3 from the handleInput method, telling the framework that that it expects there to be at least 3 bytes available before it's called again. The reason the status code is stored in a class property, and why the code ensures that it's nil before trying to parse, is so that if some subsequent parse step fails and the method returns and has to be invoked again in the future, it doesn't try to re-parse the status code because the actual data for it has already been consumed.

class GeminiProtocol: NWProtocolFramerImplementation {
	// ...
	private var tempStatusCode: Int?

	func handleInput(framer: NWProtocolFramer.Instance) -> Int {
		if tempStatusCode == nil {
			_ = framer.parseInput(minimumIncompleteLength: 3, maximumLength: 3) { (buffer, isComplete) -> Int in
				guard let buffer = buffer, buffer.count == 3 else { return 0 }
				let secondIndex = buffer.index(after: buffer.startIndex)
				if let str = String(bytes: buffer[...secondIndex], encoding: .utf8),
				   let value = Int(str, radix: 10) {
					self.tempStatusCode = value
				}
				return 3
			}
		}
		guard let statusCode = tempStatusCode else {
			return 3
		}
	}
}

Next up: the meta string. Following the same pattern as with the status code, there's a temporary property to store the result of parsing the meta string and a call to parseInput. This time, the minimum length is 2 bytes (since the Gemini spec doesn't specify a minimum length for the meta string, it could be omitted entirely, which would leave just two bytes for the carriage return and line feed) and the maximum length is 1026 bytes (up to 1024 bytes for the meta string, and again, the trailing CRLF).

This time, the closure once again validates that there is enough data to at least attempt to parse it, but then it loops through the data looking for the CRLF sequence which defines the end of the meta string^[You can't scan through the data backwards, because the response body immediately follows the CRLF after the meta string, so you could end up finding a CRLF sequence inside the body and incorrectly basing the length of the meta string off that.]. Afterwards, if the marker sequence was not found, the closure returns zero because no data was consumed. Otherwise, it constructs a string from the bytes up to the index of the carriage return, stores it in the temporary property, and returns the number of bytes consumed (index here represents the end index of the string, so without the additional + 2 the trailing CRLF would be considered part of the body). After the call to parseInput, it similarly checks that the meta was parsed successfully and returns if not.

One key difference between parsing the meta string and parsing the status code is that if the status code couldn't be parsed, the exact number of bytes that must be available before it can be attempted again is always the same: 3. That's not true when parsing the meta text: the number of bytes necessary for a retry is depedent on the number of bytes that were unsuccessfully attempted to be parsed. For that reason, there's also an optional Int variable which stores the length of the buffer that the closure attempted to parse. When the closure executes, the variable is set to the length of the buffer. If, inside the closure, the code fails to find the carriage return and line feed characters anywhere, one of two things happens: If the buffer is shorter than 1026 bytes, the closure returns zero to indicate that nothing was consumed. Then, since there's no string, the handleInput will return 1 plus the attempted meta length, indicating to the framework that it should wait until there is at least 1 additional byte of data available before calling handleInput again. If no CRLF was found, and the buffer count is greater than or equal to 1026, the closure simply aborts with a fatalError because the protocol specifies that the cannot be longer than 1024 bytes (it would be better to set some sort of 'invalid' flag on the response object and then pass that along to be handled by higher-level code, but for the purposes of this blog post, that's not interesting code). In the final case, if parsing the meta failed and the attemptedMetaLength variable is nil, that means there wasn't enough data available, so we simply return 2.

Update July 7, 2021: The eagle-eyed among you may notice that there's a flaw in the following implementation involving what happens when meta parsing has to be retried. I discovered this myself and discussed it in this follow-up post.

class GeminiProtocol: NWProtocolFramerImplementation {
	// ...
	private var tempMeta: String?

	func handleInput(framer: NWProtocolFramer.Instance) -> Int {
		// ...
		var attemptedMetaLength: Int?
		if tempMeta == nil {
			_ = framer.parseInput(minimumIncompleteLength: 2, maximumLength: 1026) { (buffer, isComplete) -> Int in
				guard let buffer = buffer, buffer.count >= 2 else { return 0 }
				attemptedMetaLength = buffer.count

				let lastPossibleCRIndex = buffer.index(before: buffer.index(before: buffer.endIndex))
				var index = buffer.startIndex
				var found = false
				while index <= lastPossibleCRIndex {
					if buffer[index] == 13 /* CR */ && buffer[buffer.index(after: index)] == 10 /* LF */ {
						found = true
						break
					}
					index = buffer.index(after: index)
				}

				if !found {
					if buffer.count < 1026 {
						return 0
					} else {
						fatalError("Expected to find <CR><LF> in buffer. Meta string may not be longer than 1024 bytes.")
					}
				}

				tempMeta = String(bytes: buffer[..<index], encoding: .utf8)
				return buffer.startIndex.distance(to: index) + 2
			}
		}
		guard didParseMeta, let meta = tempMeta else {
			if let attempted = attemptedMetaLength {
				return attempted + 1
			} else {
				return 2
			}
		}
	}
}

With the entire header parsed, an object can be constructed to represent the response metadata and an NWProtocolFramer.Message created to contain it.

class GeminiProtocol: NWProtocolFramerImplementation {
	// ...
	func handleInput(framer: NWProtocolFramer.Instance) -> Int {
		// ...
		let header = GeminiResponseHeader(status: statusCode, meta: meta)
		let message = NWProtocolFramer.Message(geminiResponseHeader: header)
	}
}

GeminiResponseHeader is a simple struct to contain the status code and the meta string in a type-safe manner:

struct GeminiResponseHeader {
	let status: Int
	let meta: String
}

As with the request object, there's a small extension on NWProtocolFramer.Message so that all the string keys are contained to a single place.

private let responseHeaderKey = "gemini_response_header"

extension NWProtocolFramer.Message {
	convenience init(geminiResponseHeader header: GeminiResponseHeader) {
		self.init(definition: GeminiProtocol.definition)
		self[responseHeaderKey] = header
	}

	var geminiResponseHeader: GeminiResponseHeader? {
		self[responseHeaderKey] as? GeminiResponseHeader
	}
}

To actually pass the message off to the client of the protocol implementation, the deliverInputNoCopy method is used. Since the handleInput method has already parsed all of the data it needs to, and the response body is defined by the protocol to just be the rest of the response data, the deliverInputNoCopy method is a useful way of passing the data straight through to the protocol client, avoiding an extra memory copy. If the protocol had to transform the body of the response somehow, it could be read as above and then delivered to the protocol client with the deliverInput(data:message:isComplete:) method.

If the request was successful (i.e., the status code was in the 2x range), we try to receive as many bytes as possible, because the protocol doesn't specify a way of determining the length of a response. All other response codes are defined to never have response bodies, so we don't need to deliver any data. Using .max is a little bit weird, since we don't actually need to receive that many bytes. But it seems to work perfectly fine in practice: once all the input is received and the other side closes the connection, the input is delivered without error.

Annoyingly, the return value of the Swift function is entirely undocumented (even in the generated headers, where the parameters are). Fortunately, the C equivalent (nw_framer_deliver_input_no_copy) is more thoroughly documented and provides an answer: the function returns a boolean indicating whether the input was delivered immediately or whether the framework will wait for more bytes before delivering it. We don't care at all about this, so we just discard the return value.

Finally, we return 0 from handleInput. Ordinarily, this would mean that there must be zero or more bytes available before the framework calls us again. But, because we've delivered all the available input, that will never happen.

class GeminiProtocol: NWProtocolFramerImplementation {
	// ...
	func handleInput(framer: NWProtocolFramer.Instance) -> Int {
		// ...
		_ = framer.deliverInputNoCopy(length: statsCode.isSuccess ? .max : 0, message: message, isComplete: true)
		return 0
	}
}

Actually using the Gemini protocol implementation will require creating an NWConnection object, which takes an endpoint and connection parameters. The parameters define which protocols to use and the various options for them. The NWParameters class already defines a number of static NWParameters variables for commonly used protocols, so adding our own for Gemini fits right in.

extension NWParameters {
	static var gemini: NWParameters {
		let tcpOptions = NWProtocolTCP.Options()
		let parameters = NWParameters(tls: geminiTLSOptions, tcp: tcpOptions)

		let geminiOptions = NWProtocolFramer.Options(definition: GeminiProtocol.definition)
		parameters.defaultProtocolStack.applicationProtocols.insert(geminiOptions, at: 0)

		return parameters
	}
	private static var geminiTLSOptions: NWProtocolTLS.Options {
		let options = NWProtocolTLS.Options()
		sec_protocol_options_set_min_tls_protocol_version(options.securityProtocolOptions, .TLSv12)
		return options
	}
}

Here the only thing we customize about the TLS options is setting the minimum required version to TLS 1.2, as required by the Gemini spec. However, the Gemini spec further recommnds that clients implement a trust-on-first-use scheme to alllow people to host content on the Gemini network using self-signed certificates, but implementing that is out of the scope of this post. If you're interested, a good starting point is the sec_protocol_options_set_verify_block function which lets you provide a closure that the framework uses to verify server certificates during the TLS handshake process.

Now, to make an API for all this that's actually pleasant to use, I pretty closely followed the URLSessionDataTask approach from Foundation, since it models somthing fairly similar to Gemini.

GeminiDataTask is a class which will store the request being sent, a completion handler, as well as an internal state and the underlying NWConnection. The initializer stores a few things, and then sets up the network connection. It uses the URL port, if it has one, otherwise the default of 1965. The host is simply the host of the requested URL. These are used to construct an NWEndpoint object and, combined with the Gemini NWParameters setup previously, create the connection. The convenience initializer also provides a slightly nicer API, so the user doesn't have to directly deal with the GeminiRequest object (which, from their perspective, is useless since there's nothing to customize about it beyond the plain old URL).

class GeminiDataTask {
	typealias Completion = (Result<GeminiResponse, Error>) -> Void

	let request: GeminiRequest
	private let completion: Completion
	private(set) var state: State
	private let connection: NWConnection

	init(request: GeminiRequest, completion: @escaping Completion) {
		self.request = request
		self.completion = completion
		self.state = .unstarted

		let port = request.url.port != nil ? UInt16(request.url.port!) : 1965
		let endpoint = NWEndpoint.hostPort(host: NWEndpoint.Host(request.url.host!), port: NWEndpoint.Port(rawValue: port)!)
		self.connection = NWConnection(to: endpoint, using: .gemini)
	}

	convenience init(url: URL, completion: @escaping Completion) throws {
		self.init(request: try GeminiRequest(url: url), completion: completion)
	}
}

The State enum is quite simple, just a few cases. It isn't used for much, just keeping track of the internal state so that the task doesn't try to perform any invalid operations on the connection.

extension GeminiDataTask {
	enum State {
		case unstarted, started, completed
	}
}

There's also a small helper struct to combine the response body and metadata into a single object:

struct GeminiResponse {
	let header: GeminiResponseHeader
	let body: Data?

	var status: Int { header.status }
	var meta: String { header.meta }
}

There are also some small methods to start and stop the request. I also copied the behavior from URLSessionTask where the task is automatically cancelled when all references to it are released.

class GeminiDataTask {
	// ...
	deinit {
		self.cancel()
	}

	func resume() {
		guard self.state == .unstarted else { return }
		self.connection.start(queue: GeminiDataTask.queue)
		self.state = .started
	}

	func cancel() {
		guard state != .completed else { return }
		self.connection.cancel()
		self.state = .completed
	}
}

When the connection starts, it needs to know which DispatchQueue to call its handler blocks on. For simplicity, here there's just a single queue used for all Gemini tasks.

class GeminiDataTask {
	static let queue = DispatchQueue(label: "GeminiDataTask", qos: .default)
	// ...
}

Also in the initializer, the stateUpdateHandler property of the connection is set to a closure which receives the connection's new state. If the connection has become ready, it sends the request. If the connection has errored for some reason, it ensures that it's closed and reports the error to the task's completion handler.

class GeminiDataTask {
	// ...
	init(request: GeminiRequest, completion: @escaping Completion) {
		// ...
		self.connection.stateUpdateHandler = { (newState) in
			switch newState {
			case .ready:
				self.sendRequest()
			case let .failed(error):
				self.state = .completed
				self.connection.cancel()
				self.completion(.failure(error))
			default:
				break
			}
		}
	}
}

To actually send the request, an NWProtocoFramer.Message is constructed for the request using the convenience initializer added earlier. Then, a custom connection context is instantiated, using the message as its metadata. The message isn't sent directly, so the connection context is how NWProtocolFramer will later get access to it. There's no data sent because Gemini requests can't have any body and the only data required is already encoded by the GeminiRequest object. Since the spec states that every connection corresponds to exactly one request, the request is completed immediately. The only thing the send completion handler needs to do is check if an error occurred while sending the request, and if so, cancel the connection and report the error.

class GeminiDataTask {
	// ...
	private func sendRequest() {
		let message = NWProtocoFramer.Message(geminiRequest: self.request)
		let context = NWConnection.ContentContext(identifier: "GeminiRequest", metadata: [message])
		self.connection.send(content: nil, contentContext: context, isComplete: true, completion: .contentProcessed({ (_) in
			if let error = error {
				self.state = .completed
				self.connection.cancel()
				self.completion(.failure(error))
			}
		}))
		self.receive()
	}
}

Once the request has been sent, the receive method is called on the task to setup the receive handler for the connection. The receive closure takes the data that was received, another content context, whether the request is completed, and any error that may have occurred. In all cases, it closes the connection and sets the task's internal state to completed. If there was an error, it's reported via the task's completion handler. As when sending the request, the NWConnection has no direct knowledge of the NWProtocolFramer and its messages, so those have to pulled out via the context. If the message and header were found, then the header is bundled up with the rest of the data that was received into a response object which is given to the completion handler.

class GeminiDataTask {
	// ...
	private func receive() {
		self.connection.receiveMessage { (data, context, isComplete, error) in
			if let error = error {
				self.completion(.failure(error))
			} else if let message = context?.protocolMetadata(definition: GeminiProtocol.definition) as? NWProtocoFramer.Message,
			          let header = message.geminiResponseHeader {
				let response = GeminiResponse(header: header, body: data)
				self.completion(.success(response))
			}

			self.connection.cancel()
			self.state = .completed
		}
	}
}

To recap, here's how it all fits together: First, the user constructs a GeminiDataTask representing the request. Next, to kickoff the request, the user calls the resume method on it. This starts the underlying NWConnection which establishes a TCP connection and performs the TLS handshake. Once the network connection is ready, its stateUpdateHandler closure is notified, causing the sendRequest method to be called on the task. That method then creates the actual message object, gives it to the connection to send, and then sets up a handler to be called when a response is received. Using the request message and the GeminiProtocol implementation, Network.framework gets the raw bytes to send over the network. The framework then waits in the background to receive a respone from the server. Once data is received from the server and has been decrypted, it returns to the GeminiProtocol which parses the metadata and then sends the rest of the data on to the protocol client. Upon receipt of the full metadata and message, the receive closure is called. The closure then passes the result of the request—either an error or the Gemini response—to the completion handler and closes the connection.

At the end of all this, the API we've got is a nice simple abstraction over a network protocol that should be fairly familiar to most Apple-platform developers:

let task = GeminiDataTask(url: URL(string: "gemini://gemini.circumlunar.space/")!) { (result)
	print("Status: \(result.status)")
	print("Meta: '\(result.meta)'")
	if let data = result.data, let str = String(data: data, encoding: .utf8) {
		print(str)
	}
}
task.resume()

Network.framework is a super is useful tool for writing custom networking code and building abstractions on top of relatively low level protocols. The example I gave here isn't a hypothetical, I'm using Network.framework and almost this exact code to build a Gemini browser app for Mac and iOS.

This post has barely scratched the surface, there's even more interesting stuff the framework is capable of, such as building peer-to-peer protocols. The documentation, in particular the Tic-Tac-Toe sample project is great resource for seeing more of what's possible.