shitjs/ShitScript

Language design

vyorkin opened this issue · 17 comments

moved from shitjs/meta/issues/2

Initial thoughts

informal description

  1. written in JavaScript
  2. no statements, only expressions
  3. no var (variables are always declared in a global scope by default)
  4. no return – ShitScript returns the last evaluated expression
  5. no floating point numbers (only integers)
  6. no unary operators (?)
  7. no classes, no array's, no object's (except for console and window)

shitty ideas

  • allow using -, ?, ! in function names (no-camel-case)
  • super weird type coercions (or just some non-obvious implicit coercions result in string 'shit')
  • ; -> )))
  • === -> ====, !== -> !===
  • (...) -> [...]
  • function -> shit / fuck
  • try -> why-the-fuck-not
  • catch -> fucked-up
  • finally -> dont-fucking-care
  • say please to enable lexical scoping
  • you can't use a couple of numbers (e.g. 4 and 2) for no reason
  • x / 0 = Math.random()
  • if -> o-rly?
  • then -> ya-rly
  • else -> no-way
  • o-rly?-ya-rly-no-way for one-liners (without brackets)
  • . -> -> (works only for console and window)

an example program:

fuck wat[] {
  calculate!!![2, 0])))
}

shit calculate!!![y, x] please {
  z = 5)))
  why-the-fuck-not {
    o-rly? z % 2 ==== 0 {
      y / x)))
    } no-way {
      x / y)))
    }
  } fucked-up[e] {
    console.lol[e])))
  } dont-fucking-care {
    0)))
  }
}

P.S.: Not sure about using words fuck and shit everywhere (may be considered offensive)

What ideas do you have comrades, for parser\tokens\etc ? Are we gonna to use some existing tools for writing our alphabet, lexical\semantic rules and so on ?

i.e. we can use Jison for parser.

looks interesting, @ghaiklor, haven't seen it before, will definitely play with it tonight.
recently I had an experience with pegjs and I believe I know how to write LL(k) parser from scratch (I'm reading Language Implementation Patters by Terence Parr book), but yes, I think its better/easier to use existing tools (DSL + parser generator) for describing our shitty formal grammar and Jison looks promising from the first sight.

function -> shit

I like this particularly because we can have some sort of a higher order shit

ok, sorry for not doing anything for quite a while, I'll get back to it very soon, I hope!

@vyorkin played a little bit with LLVM... What if we will take LLVM as a compiler and write LLVM frontend for our ShitScript ?

@ghaiklor good idea (I've just watched this talk https://www.youtube.com/watch?v=PauCAyVg348), I need to build smth very simple first (sorry still don't have enough time)

we definitely should target LLVM so we'll be able to use emascripten to target wasm

oops

@vyorkin here is my playground for llvm, but nothing special - https://github.com/ghaiklor/llvm-kaleidoscope

how about using Rust + llvm-rs + lalrpop to build this? I'm going to start working on it these weekends, the time has come :)
my plan:
– build a very basic formal grammar
– generate a parser with lalrpop (we'll need to write a custom lexer & parser later for performance reasons), but it'll suffice for now
– write some tests to verify the resulting AST
– implement a visitor that will walk the AST and generate some LLVM IR
– provide a very basic REPL (to ease testing & playing with it) that will accept options like:

    -a, --ast      Parse and output AST
    -i, --llvm-ir  Build and output LLVM IR

we could use docopt or clap crates for CLI args parsing

I'm still learning & playing with llvm-rs crate (the Compile module is complex, a lot of macroses & metaprogramming stuff), but there aren't many alternatives, I've seen them all and llvm-rs seems to be the most mature, but its not under active development

@vyorkin I've started R&D in parsers written in JavaScript. Found goodpossible solutions we can use.

Lexical analysis - Jalex. You can describe rules via regular expressions and it will call a callback when match is found. So, we will be able to describe lexical rules via regular expressions and implement all needed actions for returning a stream of tokens.

Semantic analysis - Jison. It has its own simple built-in lexical analyzer, though, I'm thinking to use Jalex, since we will definitely write our own scanner in future.

Why I chose them? They are compatible with lex\yacc format. So you can describe definitions, translation rules in plain old-way as it was done in yacc.

For a grammar, we can try to found already implemented grammar for JavaScript and just modify it to fit our needs.

Though, still thinking about other lexical analyzers, but for semantic analysis I didn't found too much, so seems like Jison is our only options for semantic.

@ghaiklor do you know any good LLVM bindings for nodejs? I've found only these 2:

@vyorkin I'm wondering why you stick to LLVM 😸
IMO, LLVM is over-engineering for our case. It's hard to support, it has a big learning curve. I understand, it will simplify code-generation phases for us, but not too much. Even, if you are going to implement it with LLVM, you still need to implement:

  1. Parser. Could be acorn\esprima\whatever gives us a parse tree but I'm going to use some kind of parser generators like flex\bison (maybe JavaScript ports).
  2. Semantic actions which will call LLVM IR builder. For that phase we need to implement own semantic parser or inject our own actions in tools above somehow. Or, we need a tool that will be a visitor for parse tree and will be calling LLVM IR builder. IMO, the best place to call IR builder in LLVM is semantic actions in our grammar. So we will be able to build LLVM AST during parsing, which saves to us another one iteration through parse tree.

So steps are with LLVM will be close to defining a scanner with rules which returns tokens with inherited and synthetic attributes. Passing these tokens into a parser which has our grammar with semantic actions. During parsing of our tokens, parser will be able to call our semantic action where we are calling LLVM IR Builder. And, do not forgot about code-generation phase which we also need to implement with LLVM.

Anyway, we'll not get magical solution for ShitScript if we are stick to LLVM.


My initial idea is to examine existing generators for lexical and semantic parsers, so we can build our own grammars right from scratch and use generators to create parsers. Afterwards, I'm looking for a way to create our own code generator. Still thinking about it, but if we will have a grammar and a parse tree, that's not a big issue to generate code in SSA form. Aaaand, when we have SSA form, that's not a big issue to generate an Assembly code from it. To be honest, I even think about generating machine code from JavaScript, but that's just thoughts.

What you all think? @vyorkin @chicoxyzzy maybe and @bniwredyc

Wow, thanks! I'll give a detailed answer today later, here is my latest unfinished playground in rust which I've started to work on after working through LLVM kaleidoscope tutorial series (same thing as you did, but I'm still not finished it yet:)). I've stopped here (LLVM IR Builder / Emitter visitor).

UPD:
I'm not sure about LLVM, but its very appealing: we get various backends (e.g. emscripten can be used to target WASM) and optimizations (traditional SSA-based, CFG-based, inteprocedural analysis & transformations) for free, JIT and a lot of other stuff. In addition, this is a very valuable experience that can be useful in the future to build something real. But the learning curve is high and I'm not sure if its worth the time wasted (and I've already spent too much).

@vyorkin

but its very appealing: we get various backends (e.g. emscripten can be used to target WASM) and optimizations (traditional SSA-based, CFG-based, inteprocedural analysis & transformations) for free, JIT and a lot of other stuff

Agreed, though, you still need to implement the correct way of applying these optimizations.

We are creating a ShitScript here, do not forgot about it. And the question here is does it worth it to investigate so much time in LLVM for building a ShitScript ? 😸
May be, a language just with stupid code generation without optimization will be as a point why it's called ShitScript, you know...

@vyorkin also, I've just found LLVM compiled to JavaScript itself - https://github.com/kripken/llvm.js
Based on the demo, it looks like we will be able to compile LLVM bytecode via JavaScript.

I.e.

// Here input is an LLVM IR
function process(input) {
  try {
    return llvmDis(llvmAs(input));
  } catch (e) {
    if (typeof e == 'string') {
      return 'Error in compilation: ' + e;
    } else {
      throw e;
    }
  }
}

Worth note that it's just a playground and as author mentioned:

This demo was done as a fun hacking project over a holiday vacation, so there are some caveats: The generated code is not optimized at all, so benchmarking is pointless; if you want to benchmark, run emscripten normally with -O2. Compilation speed has also not been optimized at all. Also, this demo has hardly been tested and glues together several codebases in ways they were not originally intended, there might be things that do not work.

Sorry I'm too drunk for this kind of shit RN