Writing compilers is hard. I don't think that anybody disputes this. However, I've grown frustrated with the lack of compiler performance and robustness in the Monte toolchain. Monte will have a developer preview release in a few weeks and I need to get some stuff concerning compilers out of my head and onto the page.
Monte, the Mess
Right now, Monte is in the doldrums. We have deliberately wound down effort on features and exploration in order to produce a developer preview meant to pique interest and generate awareness of Monte, E, object capabilities, etc. As a result, it's worth taking stock of what we've built so far.
Monte's reference implementation is embodied by the Typhon VM, a JIT written in RPython which implements the runtime and interpreter, but does not do any parsing or expansion. Typhon is satisfactory, performing much like early CPython, and outperforming E-on-Java by a bit. However, its JIT does not provide any speed boost compared to interpretation; our execution model is far too sloppy. Additionally, the JIT is fragile and crash-prone as of writing, and we have it disabled by default.
Our current method of execution is to load Kernel-Monte, compile it to an in-memory bytecode resembling, but slightly different from, Smallcaps; and then provide user-built objects which interoperate with runtime objects and internally run this quasi-Smallcaps bytecode.
Performance is behind CPython by a serious but not insurmountable margin. This is unacceptable. One of the goals of Monte is to, by virtue of being less dynamic than Python, be faster than Python in execution. It's been a truism of compilers that lower expressiveness correlates with greater optimization opportunities for a long time, and clearly we are missing out.
Monte, the Metalanguage
A non-trivial portion of the ideology of Monte, which I did not realize when I first embarked on this journey, is that Monte is really an object calculus of some sort; it hides beneath it a simple core language (Kernel-Monte) that turns out to be a very simple universal computer based on message-passing. Almost everything that makes Monte distinct as a language is built on top of this core, from promises and vats, through modules and quasiliterals, to the entirety of the safe scope. The only gnarl that I have found while working with this core, in honesty, is the semantics of mutable names (var x := f()), which I am still on the fence about, but overall is not a tough complication. (Specifically, I don't know whether mutable slots should be created by a virtual machine instruction, or a primitive _makeVarSlot object.)
Unfortunately, Monte's metalanguage doesn't exactly correspond to anything in the literature. Worse, it somewhat resembles many different things in the literature simultaneously, making the choice of techniques much harder. Computer science, as a discipline, has developed an amazing corpus of compiler techniques, but they do require one to already have committed to a particular semantics, and choosing the semantic and evaluation model for Monte has been a challenge.
I'm only one person. Whatever I end up using has to be comprehensible to me, because otherwise I can't build it. This is unfortunate, as I'm something of a dunce, so I would prefer it if my semantics were simple. Additionally, typing is hard and it would be nice to find something easy to implement.
As a brief aside, I want to emphasize that I am not going to talk about parsing today. Monte's parsing story is very simple and solid, and the canonical expansion from Full-Monte into Kernel-Monte is also relatively well-understood. I want to talk about the stuff that makes compilers hard to scale; I want to talk about optimizations and modelling of semantics.
When I say "semantics of Monte" today, I am referring to the bundle of concepts that represent Monte's evaluation at its lowest level. Everything I'm talking about today starts at Kernel-Monte and (hopefully) only goes downward in expressiveness.
Monte, the Schemer
Strange as it might seem to children like myself, Monte is actually descended from Scheme via E, and this manifests in E's actor-like concurrency and also in the E documentation, which discusses E as a variant of lambda calculus.
What Maps Well
After slot expansion, (set!) bears clear similarity to the behavior of mutable names with VarSlot.
The general design of lexically-scoped closures on objects, and thus the optimization patterns, appear very similar between Monte and Scheme. For example, this commit was directly inspired by this Scheme compiler optimization, posted to Lambda the Ultimate a week prior.
List patterns are present in some Schemes, like Racket, and Monte's list patterns are still present in Kernel-Monte; one of the few explicit type-checked kernel situations. (I think that the only other one is the if expression's test… We don't currently require bindings to be :Binding.)
What Maps Poorly
Exceptions are the obvious problem. (call/cc) provides undelimited continuations, but ejectors are explicitly delimited continuations. Something like Oleg's shift/reset semantics, or Racket exceptions, provide sufficient structure to recover the semantics, but the difference is clear. Oleg only outlines how things work; he does not offer hints on optimization. There is a standard suite of optimizations on continuations when using CPS (continuation-passing style); however, CPS massively complicates implementation.
In particular, when using CPS, all method signatures are complicated by the transformation, which means that tracebacks and other debugging information have to be munged more. We also lose our "no stale stack frames" policy, which informally states that we don't have coroutines nor generators. The CPS transformation generally generates a form of code which should be run as a coroutine, with a live (delimited) continuation passed in from the runtime. This is not impossible, but it is a drastic shift away from what I've studied and built so far.
Since Kernel-Monte is an expression language, a desugaring from def to some sort of intermediate let is highly desirable. However, I have attempted to build this algorithm thrice and been stymied every time. The corner cases are constantly recurring; even the canonical expansion is sufficient to send me into fits with some of its pathological generated code. I've concluded that this transformation, while something I dearly want, requires much more thought.
What Isn't Clear
A-normal form sounds so enticing, but I can't articulate why.
Monte, the Talker
What Maps Well
We can suppose that every object literal is recurrent in a scope; it has some maker, even if that maker is the top-level eval(). In that sense, the script of an object literal, combined with the closure of the object literal, is just like a description of a class in a class-based language with no inheritance. We can even add on Monte-style extends object composition by permitting subclasses; there is no this or self, but the subclasses could override superclass methods and we could use the standard method cache technique to make that fast.
We have two more layers of boxing than most other object-based languages, but that doesn't seem to really impede the otherwise-identical "pass-by-object" semantics of Monte with pretty much every other language in the family. Our JIT has definitely proven capable of seeing through FinalSlot and Binding just like it can see through Int.
What Maps Poorly
Our family tree really should have a strict line in the sand for scoping rules, because half of the family doesn't have static lexical scopes. Much of what has gone into making Python fast, especially in the fast implementations like PyPy and ZipPy, doesn't apply at all to Monte because Monte does not have dynamic scopes, and so Monte does not need to recover static scoping information in the JIT.
Our static scope, honestly, works against us somewhat. I can't help but feel that most of the quirky design in Ruby and Python bytecode is due to not being able to erase away lots of scope semantics; contrapositively, Monte is actually kind of hard to compile into lower forms precisely because the static scoping information makes manipulating terms harder. (This might just be me whining; I mean "hard" in the "lots of typing and thinking" sense.)
We really do need a "deslotification" system of some sort. I've thought about this, and come up with a couple conceptual systems that generate type information for slots and erase bindings and slots during compilation when it can prove that they're not needed. Unfortunately, I haven't actually implemented anything; this is another situation where things are hard to implement. Once again, this is relatively untrodden territory, other than the word "deslotification" itself, which comes from this E page. Interestingly, I independently came up with some of the schemes on that page, which suggests that I'm on the right track, but I also learned that this stuff was never really implemented, so maybe it's a dead end.
What Isn't Clear
Bytecode seems like a good thing. It also seems like a serious albatross, once we start picking on-disk bytecode formats. I'm not sure whether the Smallcaps construction really is the best way of modelling the actions that Monte objects take.
There's a couple options available to us that are relatively orthogonal to what I've talked about so far.
LLVM is the elephant in the room. It is an aggressively-optimizing, competent code generator for anything that can be put into a standard low-level-typed SSA form. For Monte, LLVM would enable a compilation strategy much like Objective-C (or, I imagine, like Swift): Arrange all objects into a generated class hierarchy, prove and erase all types to get as many unboxed objects as possible, and then emit LLVM, producing a binary that runs at a modest speed.
The main trick to LLVM that I see is that it appears to dictate a semantic model, but that is only because we are looking at LLVM through its intended lens of compiling C++, from which Objective-C appears the closest relative to Monte. However, there exist LLVM-targeting compilers which emit code that looks quite alien; the example that comes to my mind is GHC's LLVM backend, which generates the same graph-reducing machine as GHC's native backend. There's no reason that we could not pursue a similar path after figuring out our semantics.
Another growing elephant is Truffle. Truffle is much like RPython, providing pretty much the same stuff, but with two important differences. First, Truffle itself is not translated in the same way as RPython; there's a complex interaction between Truffle, Graal, and the JVM which produces the desired JIT effects. RPython's complexity is mostly borne by the compiler author; the fanciest switch on the panel of a translated RPython program is the one that controls the JIT's parameters. Truffle lets you pick between multiple different JITs at runtime! This is partially due to choices made in the JVM ecosystem that make this preferable.
The second different is worth emphasizing, just because it matters deeply to me, and I figure that it surely must resonate with other folks. Truffle is not a meta-tracing JIT like RPython, but a partially evaluating JIT. This is both a solid theoretical foundation, and a recipe for proven-correct aggressive optimizations. In benchmarks, Truffle does not disappoint. The only downside to Truffle is having to write Java in roughly the normal Java-to-Python proportions instead of RPython.
We could write pretty much anything in Truffle that we could in RPython; thus, sticking with RPython for the accumulated knowledge and experience that we have on that platform makes sense for now. A Truffle port could be done at some point, perhaps by the community.
Monte, the Frustration
I hate via patterns. But only as a compiler author. As a writer of Monte code, via is delightful. When compiling via patterns, though, one has to extract the guts of the pattern, which turns out to be a seriously tricky task in the corner cases. It's the kind of thing that even production-quality Haskell compiler libraries flinch at handling. (As a corollary, if I understood the Haskell bound package, I would be writing one compiler, in Haskell, and nothing else.)
DeepFrozen proof obligations really should be discharged at compile time whenever possible. They aren't really that expensive, but they definitely impose some running overhead. Similarly, a specializer that could discharge or desugar things like (0..!MAXSIZE) would be nice; that single expression was 20% of the runtime of the richards benchmark recently.
To be more blunt, I like partial evaluation. I would love to have a practical partial evaluator for Monte. I also don't feel that Monte's current semantics are very friendly towards partial evaluation. I really do want to lower everything into some simpler form before doing any specialization.
In conclusion, I need a vacation, I think. If only there were a Python convention coming up…
Last modified on 2016-05-02 21:02:00