How does Glamorous Toolkit’s PythonBridge Work?

Glamorous Toolkit offers Python interoperability functionality. It allows mixing Python and Smalltalk more-or-less seamlessly in a Lepiter playground. How does that work? Brace yourselves, it’s boutta get ugly.

The main entry point, from the Smalltalk side, into the integration is PBApplication. There are multiple references to PBApplication start as the way to get it going. PBApplication is a part of PythonBridge, forked by Feenk. PythonBridge is based on the general “LanguageLink” abstraction from PharoLink. In any case, through various abstractions, it uses pipenv to create a virtual environment in the PythonBridgeRuntime subdirectory of the Glamorous Toolkit installation directory. Into this virtual environment, the gtoolkit_bridge package is installed (source in PyPI directory of Feenk’s PythonBridge fork).

The LanguageLink abstraction and gtoolkit_bridge have a protocol for exchanging messages. It can go over one of two transports:

  1. (Used by default.) MsgPack over a socket; the Python process is the one listening, while Pharo connects.
  2. JSON messages over HTTP; both the Python process and Pharo listen on separate ports and connect to one-another.

The messages are expected to be a dictionary, with several keys:

  • type: Type of the message.
  • __sync: Used only for MsgPack transport; if present, this is a synchronous message. The response carries the value provided from the request. There is no direct indication in the message whether a message is supposed to be a request or a response. (For HTTP transport, responses are provided as HTTP responses.)

LanguageLink provides these message types (see class hierarchy rooted at LanguageLinkAbstractMessage):

  • ENQUEUE: Asynchronous, Pharo-to-Python. Pharo is telling Python it wants Python to execute something.
  • IS_ALIVE: Synchronous, Pharo-to-Python. Inquiry to see if Python is alive yet or is still alive.
  • STUB: Synchronous, Pharo-to-Python; not supported by gtoolkit_bridge.
  • EVAL: Asynchronous, Python-to-Pharo (called “update promise” on Pharo side). Provides results back from Python to Pharo.
  • CALLBACK: Synchronous, Python-to-Pharo. These are presumably for Python code calling back into Pharo; I have not explored this functionality yet.
  • ERR: Synchronous, Python-to-Pharo. Error report from a failed execution from ENQUEUE.
  • RSTUB: Synchronous, Python-to-Pharo; not supported by gtoolkit_bridge.

Arguably, the most important command is ENQUEUE. This is how Pharo gets what it wants to run over to Python. Notably, it has no response. It requires these keys:

  • commandId: Identifier sent back in ERR if something goes wrong.
  • statements: String containing code to run.
  • bindings: Dictionary, with values serialized, to be merged with current variables of session before execution (which are initialized with python_bridge’s own globals). Note, there’s a trick here relating to the registry – I’ll get to that in a minute.

ERR has these keys:

  • commandId: Identifier from ENQUEUE for the command that failed.
  • errMsg: Error message string.
  • trace: Stack trace string.

The response is expected to contain an action key, determining what should be done with the remaining commands (IGNORE to keep running as if nothing happened, DROP_QUEUE to drop all later commands, or REPLACE_COMMAND with other keys from ENQUEUE to replace the erroneous command with a new one).

But where’s the result?

gtoolkit_bridge doesn’t have any built-in response to say the execution has completed. Instead, PythonBridge on the Pharo side appends a little bit to the script to explicitly ask it to send something back (this logic is in LanguageLinkCommandFactory#instructionsWithNotifyAtEnd, with overrides in subclasses as appropriate). For example, this example from the documentation:

PBApplication do: [ :application | 
    application newCommandStringFactory
        script: 'pi = 333/106';
        resultExpression: 'pi';
        sendAndWait ]

Will result in this code being sent to ENQUEUE (assuming the command ID is “1234”):

pi = 333/106
notify(pi, "1234")

notify is in scope because it was imported by python_bridge, and the environment user code is run in was initialized with python_bridge’s globals. notify sends an EVAL back to Pharo with the result. Speaking of which, the keys of that EVAL are:

  • value: The first parameter, serialized.
  • id: The second parameter.

Complex values

…and I ain’t talkin’ 1j.

Values that can be trivially serialized in MsgPack or JSON will be sent over literally. Any other kind of value gets turned into a proxy. Proxies are stored in a dictionary on the Python side, keyed by their address as reported by id. When serialized, they appear as a dictionary with these keys:

  • __pyid__: Hex-encoded address (id) of object.
  • __pyclass__: Fully-qualified name of type of object.
  • __superclasses__: Fully-qualified names of superclasses of type of object (following first base of each type, so not compatible with multiple inheritance).

When deserializing, __pyid__ and __pyclass__ are expected to be present (though __pyclass__ is ignored), and __superclasses__ must not be present (nor any other keys).

On the Pharo side, proxies are represented by PBProxyObject, created by PBDeserializer#buildProxyFor:

While this works well enough, it’s not particularly hygienic – for example, if you were processing a JSON file in Python from an untrusted service, and one of the pieces of data in the untrusted JSON file had {"__pyid__": 0, "__pyclass__": 0}, it would blow up the language link if you attempted to use it from PythonBridge.

Lepiter bindings

That’s the basics of the interop, but Lepiter has this cool integration where any variables you define on the Python side automatically become available in the Pharo snippets of your playground. How does that work? This is handled by GtPythonCoderModel#bindAndExecute:inContext:.

It first parses the Python code to be sent itself, and runs it through GtPythonVarNodeVisitor, which does a crude analysis of variable names which appear on the left-hand side of assignments, parameter names, and variable references.

The Python code is modified in two ways: Something to set self is prefixed; and the last statement, if it can produce a value, is prefixed with snippetResult =.

Lepiter has an index of all variables known to it, and any “accessed before referenced” variable names are provided as bindings to PythonBridge. The resultExpression: is a dictionary containing all assigned variable names.

Tags: , , ,

Leave a Reply