Monte: Typhon Profiling

Work continues on Typhon. I've recently yearned for a way to study the Monte-level call stacks for profiling feedback. After a bit of work, I think that I've built some things that will help me.

My initial desire was to get the venerable perf to work with Typhon. perf's output is easy to understand, with a little practice, and describes performance problems pretty well.

I'm going to combine this with Brendan Gregg's very cool flame graph system for visually summarizing call stacks, in order to show off how the profiling information is being interpreted. I like flame graphs and they were definitely a goal of this effort.

Maybe perf doesn't need any help. I was able to use it to isolate some Typhon problems last week. I'll use my Monte webserver for all of these tests, putting it under some stress and then looking at the traces and flame graphs.

Now seems like a good time to mention that my dorky little webserver is not production-ready; it is literally just good enough to respond to Firefox, siege, and httperf with a 200 OK and a couple bytes of greeting. This is definitely a microbenchmark.

With that said, let's look at what perf and flame graphs say about webserver performance:

An unhelpful HTTP server profile

You can zoom in on this by clicking. Not that it'll help much. This flame graph has two big problems:

  1. Most of the time is spent in the mysterious "[unknown]" frames. I bet that those are just caused by the JIT's code generation, but perf doesn't know that they're meaningful or how to label them.
  2. The combination of JIT and builtin objects with builtin methods result in totally misleading call stacks, because most object calls don't result in new methods being added to the stack.

I decided to tackle the first problem first, because it seemed easier. Digging a bit, I found a way to generate information on JIT-created code objects and get that information to perf via a temporary file.

The technique is only documented via tribal knowledge and arcane blog entries. (I suppose that, in this regard, I am not helping.) It is described both in this kernel patch implementing the feature, and also in this V8 patch. The Typhon JIT hooks show off my implementation of it.

So, does it work? What does it look like?

I didn't upload a picture of this run, because it doesn't look different from the earlier graph! The big [unknown] frames aren't improved at all. Sufficient digging will reveal the specific newly-annotated frames being nearly never called. Clearly this was not a winning approach.

At this point, I decided to completely change my tack. I wrote a completely new call stack tracer inside Typhon. I wanted to do a sampling profiler, but sampling is hard in RPython. The vmprof project might fix that someday. For now, I'll have to do a precise profiler.

Unlabeled HTTP server profile with correct atoms

I omitted the coding montage. Every time a call is made from within SmallCaps, the profiler takes a time measurement before and after the call. This is pretty great! But can we get more useful names?

Names in Monte are different from names in, say, Python or Java. Python and Java both have class names. Since Monte does not have classes, Monte doesn't have a class name. A compromise which we accept here is to use the "display name" of the class, which will be the pattern used to bind a user-level object literal, and will be the name of the class for all of the runtime object classes. This is acceptable.

HTTP server profile with correct atoms and useful display names

Note how the graphs are differently shaped; all of the frames are being split out properly and the graph is more detailed as a result. The JIT is still active during this entire venture, and it'd be cool to see what the JIT is doing. We can use RPython's rpython.rlib.jit.we_are_jitted() function to mark methods as being JIT'd, and we can ask the flame graph generator to colorize them.

HTTP server profile with JIT COLORS HOLY FUCK I CANNOT BELIEVE IT WORKS

Oh man! This is looking pretty cool. Let's colorize the frames that are able to sit directly below JIT entry points. I do this with a heuristic (regular expression).

THE COLORS NEVER STOP, CAN'T STOP, WON'T STOP

This isn't even close to the kind of precision and detail from the amazing Java-on-Illumos profiles on Gregg's site, but it's more than enough to help my profiling efforts.

~ C.

Last modified on 2015-02-28 16:13:00

Valid CSS!