shadowfacts.net/site/posts/2021-07-07-gemini-client-de...

92 lines
7.5 KiB
Markdown
Raw Permalink Normal View History

```
metadata.title = "Debugging My Gemini NWProtocolFramer Implementation"
metadata.tags = ["swift", "gemini"]
metadata.date = "2021-07-07 23:32:42 -0400"
metadata.shortDesc = ""
metadata.slug = "gemini-client-debugging"
```
I recently ran into an issue with the Network.framework Gemini client I'd [previously implemented](/2020/gemini-network-framework/) that turned out to be somewhat perplexing. So, I thought I'd write a brief post about it in case anyone finds it interesting or helpful.
<!-- excerpt-end -->
The gist of the issue is that when connecting to a certain host, the connection would hang indefinitely. The issue was 100% reproducible in my app, both in the simulator and on an actual device, in all manner of network conditions. What led me to believe that it was an issue with my implementation was that the problem only happened with a single host (I suspect the incompatibility was with whatever server software was being used) and the fact that I could not reproduce the issue with any other client.
My initial attempts at debugging were fruitless. Every handler and callback I had was just never getting called. There was no error that was being swallowed or anything, just silence. Eventually I set a breakpoint in the `handleInput` method of my `NWProtocolFramerImplementation`.
Stepping through[^1], I saw the status code get parsed successfully. Then it moved on to parsing the response meta, a string of up to 1024 bytes followed by a carriage return and line feed.
[^1]: Debugging this is kind of painful because much of the actual work is done inside the closure passed to `NWProtocolFramer.Instance.parseInput`. It's invoked synchronously, so it does affect control flow, but if you click Step Over at the wrong time, you can accidentally skip a whole critical chunk of code.
Next, I paused inside the `parseInput` closure that's responsible for parsing the meta string. The buffer that I received was 11 bytes long. I was immediately wary because I'd seen that number before in this context—it's the length of `text/gemini`. Indeed, printing out `String(bytes: buffer, encoding: .utf8)` revealed that it was indeed the meta string I was expecting. But it was just the meta string, not including the CR/LF pair that's supposed to follow it.
Continuing to step through, I saw the closure return 0, indicating that no bytes were consumed, because the CRLF was missing. After that, the main body of the `handleInput` method would see that the meta wasn't found and would itself return the number of bytes it expected to receive before it should be invoked again. It does this by adding 1 to the length of the buffer from which meta parsing failed. So, the next time `handleInput` is called by the framework, there should be at least 12 bytes available.
After hitting resume, the debugger trapped again in the `parseInput` closure for the meta. I checked `buffer.count` and found... 11 bytes. I know `handleInput` returned 12, so why did I still only have 11 bytes?
The realization I came to, after puzzling over this for a couple days, is that `parseInput` behaves a little weirdly, though in a way that's technically correct. It seems that even if more data is available in some internal buffer of the framer's—which we know there must be because `handleInput` was invoked again after we returned 12 earlier—we won't receive all of it. It's not _wrong_, 11 bytes is indeed at least 2 and no more than 1026, but it's certainly unintuitive.
To verify this behavior, I tweaked my code to call `parseInput` with a minimum length of 13 instead of 2 the second attempt to parse meta. Lo and behold, it worked. The buffer now had 13 bytes of data: the full meta string and the CR/LF.
And that explains the user-facing issue that led to all this. A few attempts at parsing would fail, but then the server would stop sending data because there was nothing left, so `handleInput` would never be called, leaving the Gemini request in a waiting state forever.
So, to properly fix the issue what needs to happen is we have to be smarter about the `minimumIncompleteLength` on subsequent attempts to parse the metadata. To do this, I saved the length of the last buffer for which meta parsing was attempted as an instance variable in the framer implementation. With that, we can determine how many bytes we _actually_ need before trying again.
```swift
class GeminiProtocol: NWProtocolFramerImplementation {
// ...
private var lastAttemptedMetaLength: Int? = nil
// ...
func handleInput(framer: NWProtocolFramer.Instance) -> Int {
// ...
if tempMeta == nil {
let min: Int
if let lastAttemptedMetaLength = lastAttemptedMetaLength {
min = lastAttemptedMetaLength + 1
} else {
min = 2
}
_ = framer.parseInput(minimumIncompleteLength: min, maximumLength: 1024 + 2) { (buffer, isComplete) -> Int in
guard let buffer = buffer else { return }
self.lastAttemptedMetaLength = buffer.count
// ...
}
}
guard let meta = tempMeta else {
if let attempted = self.lastAttemptedMetaLength {
return attempted + 1
} else {
return 2
}
}
// ...
}
// ...
}
```
The minimum incomplete length still defaults to 2 because on the first attempt to parse, we don't have previous attempt info and the only thing we know is that there must be a carriage return and line feed (as best as I can interpret it, the Gemini spec doesn't say that there must be meta text, so it could be zero bytes).
With that, the original issue is finally fixed and requests to the problematic server complete successfully.
My best guess for why this was happening in the first place and only in such specific circumstances is this: the specific server software being used was sending the meta text over the wire separately from the CRLF. This could mean they arrive at my device separately and are thus stored by in two separate "chunks". Then, when `parseInput` is called, Network.framework simply starts looking through the stored chunks in order. And, since the first chunk is longer than `minimumIncompleteLength`, it's the only one that's returned.
There's a note in the Network.framework header comments[^2] for `nw_framer_parse_input` that leads me to believe this. It says, with regard to the `temp_buffer` parameter:
[^2]: Annoyingly it's not present in the Swift docs nor the Objective-C docs. Seriously, this is not good. There's a wealth of information in the header comments that's not present in the regular docs. If you feel so inclined, you can dupe FB9163518: "Many NWProtocolFramer.Instance methods are missing docs that are present in the generated headers".
> If it is NULL, the buffer provided in the completion will not copy unless a copy is required to provide the minimum bytes as a contiguous buffer.
The possibility of a copy being needed to form a contiguous buffer implies that there could be discontiguous data, which lines up with my "chunks" hypothesis and would explain the behavior I observed.
<aside>
Fun fact, the C function corresponding to this Swift API, `nw_framer_parse_input`, takes a maximum length, but it also lets to pass in your own temporary buffer, in the form of a `uint8_t*`. It's therefore up to the caller to ensure that the buffer that's pointed to is at least as long as the maximum length. This seems like a place ripe for buffer overruns in sloppily written protocol framer implementations.
</aside>
Anyhow, if you're interested, you can find the current version of my Gemini client implementation (as of this post) [here](https://git.shadowfacts.net/shadowfacts/Gemini/src/commit/3055cc339fccad99ab064f2daccdb65efa8024c0/GeminiProtocol/GeminiProtocol.swift).