v6/site/posts/2021-07-07-gemini-client-debugging.md
2022-12-10 13:15:32 -05:00

7.4 KiB

title = "Debugging My Gemini NWProtocolFramer Implementation"
tags = ["swift", "gemini"]
date = "2021-07-07 23:32:42 -0400"
slug = "gemini-client-debugging"

I recently ran into an issue with the Network.framework Gemini client I'd previously implemented that turned out to be somewhat perplexing. So, I thought I'd write a brief post about it in case anyone finds it interesting or helpful.

The gist of the issue is that when connecting to a certain host, the connection would hang indefinitely. The issue was 100% reproducible in my app, both in the simulator and on an actual device, in all manner of network conditions. What led me to believe that it was an issue with my implementation was that the problem only happened with a single host (I suspect the incompatibility was with whatever server software was being used) and the fact that I could not reproduce the issue with any other client.

My initial attempts at debugging were fruitless. Every handler and callback I had was just never getting called. There was no error that was being swallowed or anything, just silence. Eventually I set a breakpoint in the handleInput method of my NWProtocolFramerImplementation.

Stepping through1, I saw the status code get parsed successfully. Then it moved on to parsing the response meta, a string of up to 1024 bytes followed by a carriage return and line feed.

Next, I paused inside the parseInput closure that's responsible for parsing the meta string. The buffer that I received was 11 bytes long. I was immediately wary because I'd seen that number before in this context—it's the length of text/gemini. Indeed, printing out String(bytes: buffer, encoding: .utf8) revealed that it was indeed the meta string I was expecting. But it was just the meta string, not including the CR/LF pair that's supposed to follow it.

Continuing to step through, I saw the closure return 0, indicating that no bytes were consumed, because the CRLF was missing. After that, the main body of the handleInput method would see that the meta wasn't found and would itself return the number of bytes it expected to receive before it should be invoked again. It does this by adding 1 to the length of the buffer from which meta parsing failed. So, the next time handleInput is called by the framework, there should be at least 12 bytes available.

After hitting resume, the debugger trapped again in the parseInput closure for the meta. I checked buffer.count and found... 11 bytes. I know handleInput returned 12, so why did I still only have 11 bytes?

The realization I came to, after puzzling over this for a couple days, is that parseInput behaves a little weirdly, though in a way that's technically correct. It seems that even if more data is available in some internal buffer of the framer's—which we know there must be because handleInput was invoked again after we returned 12 earlier—we won't receive all of it. It's not wrong, 11 bytes is indeed at least 2 and no more than 1026, but it's certainly unintuitive.

To verify this behavior, I tweaked my code to call parseInput with a minimum length of 13 instead of 2 the second attempt to parse meta. Lo and behold, it worked. The buffer now had 13 bytes of data: the full meta string and the CR/LF.

And that explains the user-facing issue that led to all this. A few attempts at parsing would fail, but then the server would stop sending data because there was nothing left, so handleInput would never be called, leaving the Gemini request in a waiting state forever.

So, to properly fix the issue what needs to happen is we have to be smarter about the minimumIncompleteLength on subsequent attempts to parse the metadata. To do this, I saved the length of the last buffer for which meta parsing was attempted as an instance variable in the framer implementation. With that, we can determine how many bytes we actually need before trying again.

class GeminiProtocol: NWProtocolFramerImplementation {
	// ...
	private var lastAttemptedMetaLength: Int? = nil
	// ...

	func handleInput(framer: NWProtocolFramer.Instance) -> Int {
		// ...
		if tempMeta == nil {
			let min: Int
			if let lastAttemptedMetaLength = lastAttemptedMetaLength {
				min = lastAttemptedMetaLength + 1
			} else {
				min = 2
			}
			_ = framer.parseInput(minimumIncompleteLength: min, maximumLength: 1024 + 2) { (buffer, isComplete) -> Int in
				guard let buffer = buffer else { return }
				self.lastAttemptedMetaLength = buffer.count
				// ...
			}
		}
		guard let meta = tempMeta else {
			if let attempted = self.lastAttemptedMetaLength {
				return attempted + 1
			} else {
				return 2
			}
		}
		// ...
	}

	// ...
}

The minimum incomplete length still defaults to 2 because on the first attempt to parse, we don't have previous attempt info and the only thing we know is that there must be a carriage return and line feed (as best as I can interpret it, the Gemini spec doesn't say that there must be meta text, so it could be zero bytes).

With that, the original issue is finally fixed and requests to the problematic server complete successfully.

My best guess for why this was happening in the first place and only in such specific circumstances is this: the specific server software being used was sending the meta text over the wire separately from the CRLF. This could mean they arrive at my device separately and are thus stored by in two separate "chunks". Then, when parseInput is called, Network.framework simply starts looking through the stored chunks in order. And, since the first chunk is longer than minimumIncompleteLength, it's the only one that's returned.

There's a note in the Network.framework header comments2 for nw_framer_parse_input that leads me to believe this. It says, with regard to the temp_buffer parameter:

If it is NULL, the buffer provided in the completion will not copy unless a copy is required to provide the minimum bytes as a contiguous buffer.

The possibility of a copy being needed to form a contiguous buffer implies that there could be discontiguous data, which lines up with my "chunks" hypothesis and would explain the behavior I observed.

Anyhow, if you're interested, you can find the current version of my Gemini client implementation (as of this post) here.


  1. Debugging this is kind of painful because much of the actual work is done inside the closure passed to NWProtocolFramer.Instance.parseInput. It's invoked synchronously, so it does affect control flow, but if you click Step Over at the wrong time, you can accidentally skip a whole critical chunk of code. ↩︎

  2. Annoyingly it's not present in the Swift docs nor the Objective-C docs. Seriously, this is not good. There's a wealth of information in the header comments that's not present in the regular docs. If you feel so inclined, you can dupe FB9163518: "Many NWProtocolFramer.Instance methods are missing docs that are present in the generated headers". ↩︎