/iovm2

iovm2 is an attempt to build optimizing compiler for Io programming language

Primary LanguageJavaScript

PREFACE

  iovm2 is an attempt to build optimizing compiler for Io programming language (iolanguage.com).

  Current Io VM is an interpreter written by Steve Dekorte. The idea of iovm2 is to extend current VM with
  bytecode generation for hot paths (somewhat similar to trace recording: see TraceMonkey project).
  Generated bytecode should be interpreted by additional interpreter and transformed on the fly by a number of optimizing filters.

  The aim is to build a reasonably fast VM which is easy to analyse and extend. 
  I'm not a hardcore C hacker at all, so I really need this software to be understandable by mere mortals.
  I want to keep maximum of VM features in Io itself. Like builtin bytecode sequences and optimization filters.
  New VM should be optimized for the metaprogramming techniques which make Io outstanding. Source code should still
  be runtime inspectable and modifiable, arguments should be lazy.

  iovm2 is not about rewriting core library and does not change syntax.
  I try to reuse current parser, libcoroutine (?), libgarbagecollector and libbasekit. 

  I've never written any programming language, but I like Io very much, so I'd like 
  to use it in projects where performance matters. 


DRAFT OVERVIEW
  
  1. dekorte vm (vm1) loads and parses source code, creates message chains.
  2. vm1 records call statistics for every message
  3. when particular message evaluation is considered as a hotspot, vm1 starts
     recording a bytecode stream
  4. where bytecode stream is available, vm1 drops execution to vm2 interpreter
  5. interpreter walks bytecode stream and applies optimizing filters
  6. each filter is built in such way that it passes bytecode through without 
     changes when it is not properly optimized by a previous filter.
     This allows to avoid blocking execution flow by aggressive stop-the-world 
     multipass optimization, but still optimizing code each time it is executed.
     Hot paths are walked many times, therefore receiving maximum optimization.
     And still we don't spend much time for optimizing cold paths and rarely executed 
     code.
  7. One of the filters can implement machine code generation (behaving as a JIT).
     However, this is not a scope of my qualification, so i leave this to professionals.
  
  Emission and interpretation of a bytecode is slower than raw message chains interpretation.
  However, bytecode once recorded could be greatly optimized: unneccessary instructions could be
  removed, some objects could be created on the stack, some slots could be allocated statically and
  referenced by offset rather than name etc. 
  This technique might give us 20-200 times performance boost (i hope so).


TODO

- state of vm2 interpreter: registers and the stack (pretty much the same as vm1's).
- primitive bytecode sequences (slot lookup, message perform, if, while, loop, return, break, continue, clone, arithmetics).
- inlining heuristics: when to start bytecode recording, when to stop bytecode recording.
- inline cache invalidation techniques: valid marker, complete guards, callbacks for invalidating callsites.
- set of optimization rules for various patterns.


Author:  Oleg Andreev <oleganza@gmail.com>
Date:    February 3, 2009
License: WTFPL