Survey of Platform-Native WebSocket Implementations

My game is multi-player, so naturally, it needs some way to communicate over the network. Building on the theme of starting with Web-platform technologies, I plan for most of the real-time communication to happen through WebSocket. But despite using Web-platform technologies, the primary deployment environment for my game is not a Web browser, so I need a native implementation I can plug into my game.

Why WebSocket?

It’s already available in Web browsers, so I can easily prototype in a Web browser, where iteration can be quick. It gives me a clear path towards Web deployment, should I decide I want to do that. It’s got built-in transport security through TLS with the wss:// protocol. And on the server side, there’s lots of flexibility with how to serve it – with options from a traditional one machine serving it up, to fitting in nicely with load balancers, to “serverless” environments like CloudFlare Workers or AWS Lambda combined with AWS API Gateway. Not to mention, there are plenty of debugging tools readily available for it already, without the need to build bespoke tooling.

Why platform-native?

Traditional socket networking is not “hard”, strictly speaking. Opening a socket is not that hard, name resolution is usually (…usually) not that complicated, etc. But there’s a lot of edge cases that are complicated – take proxies for example, or DNS-over-HTTPS, or Happy Eyeballs for IPv6. Building from sockets up, most people ignore implementing this extra functionality; to be fair, it is probably not super important for a game, but if we can leverage the platform’s built-in functionality, we get all that for free, so our game can work in more obscure environments where other games may fall flat.

The TLS layer is where things get really hairy. There’s a lot that can go wrong in a TLS implementation, and they should be kept up to date regularly. As games are artifacts that are often built once and seldom updated after that, it’s best not to bundle a TLS library and rather use one that’s already on the system that can be separately updated. Aside from the update problem, dealing with certificate trust is also hairy: Some libraries ship their own roots, ignoring system roots, for example, which both (a) is not future-proof (the roots you ship will eventually expire), and (b) fails in oddball network scenarios (like a network administrator adding TLS interception, which is probably a bad idea but that’s not always in the user’s control).

For the WebSocket protocol itself, built upon a platform-native TLS layer, there is not that much to be concerned about, but if you are also using general HTTP functionality, it would really be best if the same code was used for both HTTP and WebSocket, and you really don’t want to implement HTTP yourself.

So I seek a platform-native implementation of WebSocket, that requires minimal intervention at lower layers.

Windows

WinINet is the standard platform interface to HTTP in Windows, but it doesn’t support WebSocket. However, there is a newer API, WinHTTP, which supports WebSocket on Windows 8 and up.

The general approach of using WinHTTP for WebSocket is:

  • Open a WinHTTP session handle with WinHttpOpen.
  • Use WinHttpCrackUrl to crack a URL string into pieces.
  • Use WinHttpConnect to connect to the server.
  • Use WinHttpOpenRequest on the connection handle to start a request.
  • Use WinHttpSetOption to set the WINHTTP_OPTION_UPGRADE_TO_WEB_SOCKET option.
  • Send the request with WinHttpSendRequest.
  • Receive a preliminary response with WinHttpReceiveResponse.
  • Use WinHttpQueryHeaders with WINHTTP_QUERY_STATUS_CODE to get the status code, and check that the status code returned was 101 (HTTP_STATUS_SWITCH_PROTOCOLS, “Switching Protocols”).
  • Call WinHttpWebSocketCompleteUpgrade to finish the protocol switch.
  • Use WinHttpWebSocketSend and WinHttpWebSocketReceive to exchange data.
  • Use WinHttpWebSocketShutdown to close the write end without closing the read end, or WinHttpWebSocketClose to shut down both.
  • Use WinHttpCloseHandle to close the WebSocket handle, request handle, connection handle, and session handle.

To use WinHTTP asynchronously, WINHTTP_FLAG_ASYNC can be passed to WinHttpOpen. You then specify a callback with WinHttpSetStatusCallback. All functions will then operate asynchronously, starting the operation but not waiting for its completion. When the operation is complete, the callback will be called.

One complicated part is that per the documentation:

The callback function must be threadsafe and reentrant because it can be called on another thread for a separate request, and reentered on the same thread for the current request. It must therefore be coded to handle reentrance safely while processing.

This requires some care. I initially interpreted this quite pathologically, but the reality is not so harsh. I may explain in another post later.

Apple platforms

NSURLSession in the Foundation framework is the platform-native way to do HTTP operations on Apple platforms, including macOS, iOS, iPadOS, etc. Since macOS 10.15, NSURLSession also supports WebSocket. NSURLSession must be used from Objective-C, and it is always asynchronous.

To use it, first an NSURLSessionConfiguration object must be created. Then, an NSURLSession must be created with +[NSURLSession sessionWithConfiguration:delegate:delegateQueue:]. To avoid the threading difficulties present in WinHTTP, NSURLSession requires you to specify an NSOperationQueue to run delegate callbacks on. The delegate you pass receives callbacks when operations are complete.

To open a WebSocket connection, -[NSURLSession webSocketTaskWithURL:] or one of its other variants are used. The delegate should conform to NSURLSessionWebSocketDelegate, and will receive URLSession:webSocketTask:didOpenWithProtocol: and URLSession:webSocketTask:didCloseWithCode:reason: messages on the provided dispatch queue.

To communicate, methods on the NSURLSessionWebSocketTask object returned from webSocketTaskWithURL: can be used, such as sendMessage:completionHandler: and receiveMessageWithCompletionHandler:. Each of these receive blocks that will be executed on completion on the configured delegate queue. NSURLSessionWebSocketMessage is used to represent messages straightforwardly.

It’s worth noting that aside from using NSURLSession, there is also the possibility of using the Network framework directly, which also supports WebSocket, and wouldn’t require Objective-C, but NSURLSession is higher level and also supports standard HTTP, which is nice to have.

Linux and other open-source Unices

Linux doesn’t have a platform-native WebSocket implementation. Sike! There’s also no platform-native HTTP implementation, or TLS implementation, or… (at least there is a platform-native DNS resolver, but even that is a little sketchy for asynchronous use). But there are popular options, and sticking with the rest of the pack is likely to yield good results.

libcurl is arguably the most common HTTP library on Linux. Does it support WebSocket? Well… technically yes, but it’s experimental, and compiled out by default. Debian enabled it a few months ago, saying it will be stabilized in September. I can’t rely on users having a libcurl installed that has this functionality present. I could distribute my own libcurl build with WebSocket enabled, but I’d rather not conflict with the libcurl already present on the system. I’d prefer just not to go down this rabbit-hole.

libcurl’s docs/internals/WEBSOCKET.md provides a curious pointer:

libWebSocket is said to be a solid, fast and efficient WebSocket library with a vast amount of users. My plan was originally to build upon it to skip having to implement the low level parts of WebSocket myself.

(N.B., the libcurl document says “libWebSocket”, but as far as I can tell, “libwebsockets” with an S and without capitalization is what was meant.) It then lists a bunch of reasons why libcurl decided not to build upon it. Some of them do really do make me wonder… it mentions that libwebsockets is bloated, and when I look, it has an HTML renderer, complete with layout logic, image decoding, and a display driver? A little bloat I could understand, but this is really outside of what I want to have as a dependency.

Unfortunately, I was not able to find a better solution. I will likely start off with a vendored libcurl with WebSocket enabled, but after that, I may transition to a more custom implementation. We can still crib off of some of libcurl’s work even when WebSocket is disabled – CURLOPT_CONNECT_ONLY set to 2L on a normal HTTP connection would let us talk the WebSocket protocol ourselves after handshaking. That’s only available starting 7.86.0, though (and SteamOS sniper runtime only has 7.74.0), but 1L is available way further back, if we’re willing to talk HTTP ourselves (but let libcurl still do all the proxy and TLS setup).

So say we use libcurl for our WebSocketing needs. cURL’s normal way of doing “asynchronous” work is through its ‘multi’ interface. This is easy enough to use for concurrent HTTP requests, but for WebSocket, its use is somewhat involved. This, too, may be the subject of a later post.

Others?

That covers all the main desktop operating systems, as well as mobile for Apple platforms. I have not investigated how Android expects this to be done, and I would not be surprised if consoles had no platform-native implementation. I may investigate these later.

Tags: , , ,

Leave a Reply