Language design
vyorkin opened this issue · 17 comments
moved from shitjs/meta/issues/2
Initial thoughts
informal description
- written in JavaScript
- no statements, only expressions
- no
var
(variables are always declared in a global scope by default) - no
return
– ShitScript returns the last evaluated expression - no floating point numbers (only integers)
- no unary operators (?)
- no classes, no
array
's, noobject
's (except forconsole
andwindow
)
shitty ideas
- allow using
-
,?
,!
in function names (no-camel-case
) - super weird type coercions (or just some non-obvious implicit coercions result in string
'shit'
) ;
->)))
===
->====
,!==
->!===
(...)
->[...]
function
->shit
/fuck
try
->why-the-fuck-not
catch
->fucked-up
finally
->dont-fucking-care
- say
please
to enable lexical scoping - you can't use a couple of numbers (e.g.
4
and2
) for no reason x / 0
=Math.random()
if
->o-rly?
then
->ya-rly
else
->no-way
o-rly?
-ya-rly
-no-way
for one-liners (without brackets).
->->
(works only forconsole
andwindow
)
an example program:
fuck wat[] {
calculate!!![2, 0])))
}
shit calculate!!![y, x] please {
z = 5)))
why-the-fuck-not {
o-rly? z % 2 ==== 0 {
y / x)))
} no-way {
x / y)))
}
} fucked-up[e] {
console.lol[e])))
} dont-fucking-care {
0)))
}
}
P.S.: Not sure about using words fuck
and shit
everywhere (may be considered offensive)
What ideas do you have comrades, for parser\tokens\etc ? Are we gonna to use some existing tools for writing our alphabet, lexical\semantic rules and so on ?
i.e. we can use Jison for parser.
looks interesting, @ghaiklor, haven't seen it before, will definitely play with it tonight.
recently I had an experience with pegjs and I believe I know how to write LL(k) parser from scratch (I'm reading Language Implementation Patters by Terence Parr book), but yes, I think its better/easier to use existing tools (DSL + parser generator) for describing our shitty formal grammar and Jison looks promising from the first sight.
function -> shit
I like this particularly because we can have some sort of a higher order shit
ok, sorry for not doing anything for quite a while, I'll get back to it very soon, I hope!
@vyorkin played a little bit with LLVM... What if we will take LLVM as a compiler and write LLVM frontend for our ShitScript ?
@ghaiklor good idea (I've just watched this talk https://www.youtube.com/watch?v=PauCAyVg348), I need to build smth very simple first (sorry still don't have enough time)
we definitely should target LLVM so we'll be able to use emascripten to target wasm
oops
@vyorkin here is my playground for llvm, but nothing special - https://github.com/ghaiklor/llvm-kaleidoscope
how about using Rust + llvm-rs + lalrpop to build this? I'm going to start working on it these weekends, the time has come :)
my plan:
– build a very basic formal grammar
– generate a parser with lalrpop (we'll need to write a custom lexer & parser later for performance reasons), but it'll suffice for now
– write some tests to verify the resulting AST
– implement a visitor that will walk the AST and generate some LLVM IR
– provide a very basic REPL (to ease testing & playing with it) that will accept options like:
-a, --ast Parse and output AST
-i, --llvm-ir Build and output LLVM IR
we could use docopt or clap crates for CLI args parsing
I'm still learning & playing with llvm-rs crate (the Compile
module is complex, a lot of macroses & metaprogramming stuff), but there aren't many alternatives, I've seen them all and llvm-rs
seems to be the most mature, but its not under active development
@vyorkin I've started R&D in parsers written in JavaScript. Found goodpossible solutions we can use.
Lexical analysis - Jalex. You can describe rules via regular expressions and it will call a callback when match is found. So, we will be able to describe lexical rules via regular expressions and implement all needed actions for returning a stream of tokens.
Semantic analysis - Jison. It has its own simple built-in lexical analyzer, though, I'm thinking to use Jalex, since we will definitely write our own scanner in future.
Why I chose them? They are compatible with lex\yacc format. So you can describe definitions, translation rules in plain old-way as it was done in yacc.
For a grammar, we can try to found already implemented grammar for JavaScript and just modify it to fit our needs.
Though, still thinking about other lexical analyzers, but for semantic analysis I didn't found too much, so seems like Jison is our only options for semantic.
@ghaiklor do you know any good LLVM bindings for nodejs? I've found only these 2:
- https://github.com/dirk/llvm2
- https://github.com/kevinmehall/node-llvm
both seem outdated :(
LLVM is hard, but I've already wasted so much time learning it, I think its too late to give up on it :)
@vyorkin I'm wondering why you stick to LLVM 😸
IMO, LLVM is over-engineering for our case. It's hard to support, it has a big learning curve. I understand, it will simplify code-generation phases for us, but not too much. Even, if you are going to implement it with LLVM, you still need to implement:
- Parser. Could be acorn\esprima\whatever gives us a parse tree but I'm going to use some kind of parser generators like flex\bison (maybe JavaScript ports).
- Semantic actions which will call LLVM IR builder. For that phase we need to implement own semantic parser or inject our own actions in tools above somehow. Or, we need a tool that will be a visitor for parse tree and will be calling LLVM IR builder. IMO, the best place to call IR builder in LLVM is semantic actions in our grammar. So we will be able to build LLVM AST during parsing, which saves to us another one iteration through parse tree.
So steps are with LLVM will be close to defining a scanner with rules which returns tokens with inherited and synthetic attributes. Passing these tokens into a parser which has our grammar with semantic actions. During parsing of our tokens, parser will be able to call our semantic action where we are calling LLVM IR Builder. And, do not forgot about code-generation phase which we also need to implement with LLVM.
Anyway, we'll not get magical solution for ShitScript if we are stick to LLVM.
My initial idea is to examine existing generators for lexical and semantic parsers, so we can build our own grammars right from scratch and use generators to create parsers. Afterwards, I'm looking for a way to create our own code generator. Still thinking about it, but if we will have a grammar and a parse tree, that's not a big issue to generate code in SSA form. Aaaand, when we have SSA form, that's not a big issue to generate an Assembly code from it. To be honest, I even think about generating machine code from JavaScript, but that's just thoughts.
What you all think? @vyorkin @chicoxyzzy maybe and @bniwredyc
Wow, thanks! I'll give a detailed answer today later, here is my latest unfinished playground in rust which I've started to work on after working through LLVM kaleidoscope tutorial series (same thing as you did, but I'm still not finished it yet:)). I've stopped here (LLVM IR Builder / Emitter visitor).
UPD:
I'm not sure about LLVM, but its very appealing: we get various backends (e.g. emscripten can be used to target WASM) and optimizations (traditional SSA-based, CFG-based, inteprocedural analysis & transformations) for free, JIT and a lot of other stuff. In addition, this is a very valuable experience that can be useful in the future to build something real. But the learning curve is high and I'm not sure if its worth the time wasted (and I've already spent too much).
but its very appealing: we get various backends (e.g. emscripten can be used to target WASM) and optimizations (traditional SSA-based, CFG-based, inteprocedural analysis & transformations) for free, JIT and a lot of other stuff
Agreed, though, you still need to implement the correct way of applying these optimizations.
We are creating a ShitScript here, do not forgot about it. And the question here is does it worth it to investigate so much time in LLVM for building a ShitScript ? 😸
May be, a language just with stupid code generation without optimization will be as a point why it's called ShitScript, you know...
@vyorkin also, I've just found LLVM compiled to JavaScript itself - https://github.com/kripken/llvm.js
Based on the demo, it looks like we will be able to compile LLVM bytecode via JavaScript.
I.e.
// Here input is an LLVM IR
function process(input) {
try {
return llvmDis(llvmAs(input));
} catch (e) {
if (typeof e == 'string') {
return 'Error in compilation: ' + e;
} else {
throw e;
}
}
}
Worth note that it's just a playground and as author mentioned:
This demo was done as a fun hacking project over a holiday vacation, so there are some caveats: The generated code is not optimized at all, so benchmarking is pointless; if you want to benchmark, run emscripten normally with -O2. Compilation speed has also not been optimized at all. Also, this demo has hardly been tested and glues together several codebases in ways they were not originally intended, there might be things that do not work.
Sorry I'm too drunk for this kind of shit RN