My fork of lit
Opened this issue · 11 comments
I've forked lit, though not through github, for reasons that follow:
-
I've greatly relaxed the parser. I understand that code style is a matter of perspective, but I disagree on forcing a specific style - in this case, that braces are on the same line (i.e.,
if(...) { ...
). now, braces can be on their own line. -
i've rewritten large parts to make use of sds for strings, which makes string operations faster, but most importantly, easier to use.
-
implemented
==
explicitly for most primitive types - previously, this was implemented in the VM directly, which caused some odd behaviour when comparing non-numeric values. this isn't complete, but the current impl should give an idea what I mean. -
reduced headers into a single header. this will seem extreme, but I hope to put lit into a state where it can be included in a projet by plopping the source into a tree with minimal issues. adding to that, fixing a great deal of compiler warnings.
-
rewrote the bits that are compiler-dependant, so that lit, for example, can now be compiled with visualstudio, tinycc, etc. stuff like dynamic C arrays are a GNU extension, and I removed them, or replaced them with explicit allocations.
Since I've currently cared more about features, the testing framework is uh, kinda sorta not there anymore. But if you can ./run
files like mandel.lit
, or anything in ./examples/
, then that's as good as sanity testing goes right now. This is something i'm definitely working on, ideally without 3rdparty deps.
All in all, I've ironed out lit to the point where there are zero issues in valgrind, etc. No mem leaks. Memory usage is approx. the same as upstream. It compiles cleanly.
But most of these changes aren't compatible with your upstream, so I'd like to know what you think! Maybe we'll come to some understanding.
My dude, @apfeltee, sorry for seeing this almost a month after you've opened the issue.. My life has been... Interesting, to say at the very least :)
Incredible work. Huge props for figuring out the internal workings of the whole language. I would really love to merge some of your changes into the main repo, but I have a lot of questions to discuss before that:
- What forced you to reorganize the project so massively? You also started from a fresh git repo, by the looks of it, a weird choice, but ok.
- Why do you think
;
are needed? I really dislike, how JavaScript allows you to both use and ignore them. Imho, the parser had 0 issues parsing the code without a single semicolon, and I dislike the look it, makes it look so much less script like... - You've made a lot, a lot of code style changes. I understand, that we each have our views on how the code should look, but my man, you literally brought everything down and rebuilt from scratch, ahahhahah.
- Are lines like
(void)vm;
just for avoidingargument not used
warning? I feel like a compiler flag suits this better, idk. Or at least something likeNOT_USED(vm)
macro. Because this just looks odd to my eye. But again, I understand, that we have different backgrounds and views. - SDS sounds like generally a good change.
- I do not remember specifically forcing braces to be on different lines, but okay, I'm good with the change.
- Reduced files to a single header... Okay, I understand how this is useful, but there are tools that allow you to compile your source code into a single header file, a lot of libraries do that. It is just much harder to work with code like that, even ignoring the RAM/CPU loads on my poor tiny laptop.
- Thanks for the compiler independent changes, good work!
As I've said, I really want to get your work into the main repo. But that would mean, probably redoing most of the work again, but this time on the main tree. Are you up to this task? I'm up for collaboration, I have to admit, this project has been paused for a bit due to my IRL stuff, but I really want to finish it and write my next game in it, that's the dream.
If you want to reach me faster, hit me up on discord egordorichev#5245
or on twitter @egordorichev
.
Cheers, mate!
What forced you to reorganize the project so massively? You also started from a fresh git repo, by the looks of it, a weird choice, but ok.
Because the changes were already not compatible with upstream by the time I could have forked. It just kinda happened.
Why do you think ; are needed? I really dislike, how JavaScript allows you to both use and ignore them. Imho, the parser had 0 issues parsing the code without a single semicolon, and I dislike the look it, makes it look so much less script like...
Mostly for oneliners a la ./run -e 'do_a_thing(); do_another_thing();'
. In no way do I suggest to force semicolons, just to make them optional.
In the case of javascript, it works ok. Again, the idea is to make it optional, and not to explicitly disallow it.
You've made a lot, a lot of code style changes. I understand, that we each have our views on how the code should look, but my man, you literally brought everything down and rebuilt from scratch, ahahhahah.
Well, it get better (or worse, depending on how you look at it): I had started to convert the code to C++ for some pretty good reasons:
- Modern C++ (>c++17) has things like threading, filesystem access, and even regular expressions baked in. Particularly the latter is what made me change to C++, as this prevents having to deal with libpcre or anything like that directly. There's like a handful of versions of PCRE, and they're all incompatible with eachother. I'd rather not do that manually.
- All the objects in Lit already use "inheritance" in one form or another; C++ merely makes this more obvious.
- Templates and function overloading makes most macros redundant, resulting in code that is easier to follow, and thus, easier to optimize.
But again, that version is so wholly different that I don't expect you to agree with it, but I think that, in order to make Lit more usable, and better optimized, C++ is pretty much the way to go.
Here's the link to that version: https://github.com/apfeltee/nlit - be warned though, all the code resides in one file for now. But it's obviously only for now.
Are lines like (void)vm; just for avoiding argument not used warning? I feel like a compiler flag suits this better, idk. Or at least something like NOT_USED(vm) macro. Because this just looks odd to my eye. But again, I understand, that we have different backgrounds and views.
That's because there's little sense in keeping the warnings if that particular code never changes. And since (void)
is well enough defined, there's no need for additional macros.
I do not remember specifically forcing braces to be on different lines, but okay, I'm good with the change.
The original parser required them on the same line as their associated keyword. But tbh, I don't recall specifics, I just relaxed that part of the parser.
Reduced files to a single header... Okay, I understand how this is useful, but there are tools that allow you to compile your source code into a single header file, a lot of libraries do that. It is just much harder to work with code like that, even ignoring the RAM/CPU loads on my poor tiny laptop.
At least with C, the difference is marginal at best. Unless your device is from the 1980s, this is unlikely to pose an issue.
tl;dr i'm probably going to focus on the C++ version for now. It would be easier to port (thanks to builtin standard libraries), won't need dirent.h
et al, among so many other things. But that's obviously a total break from the plain C version...
Because the changes were already not compatible with upstream by the time I could have forked. It just kinda happened.
Yeah, I also kinda wondered, why did you go with the stack-based vm, when the register-based vm was already in the master...
Mostly for oneliners
Well, it works already.. I mean, I get that there are some edgecases, but man I don't want to introduce ;
into the language...
C++ version
Okay woooooow.
I need to think a lot on this. I have to admit, C++ is not my favorite language. I'd much rather prefer to use C in most cases. But you make really really compelling arguments...
I will go think for a while, and come back to you with fresh thoughts on the topic.
One last question tho: what part of lit made you want to put in so much work into it?
Yeah, I also kinda wondered, why did you go with the stack-based vm, when the register-based vm was already in the master...
Do you mean computed gotos? Because I already modified those macros, so that would be chosen if the compiler supports it.
Well, it works already
Yeah, but why reinvent the wheel? It's neat that it is possible, but it's also neat to be able to use semicolons. Really, they're just meant to be optional.
I will go think for a while, and come back to you with fresh thoughts on the topic.
Please do! I might be willing to backport some stuff into the C ver, but ultimately, I see little hope for the C version.
A lot of more complicated stuff, such as threading, get incredibly messy incredibly quickly in C, and you would have to additionally maintain hundreds of lines of boilerplate. I used to do that, and it sucks.
One last question tho: what part of lit made you want to put in so much work into it?
It's similar enough to javascript, without being as huge and unwieldy like v8, quickjs, et cetera...
Not to mention, the ECMA specification is just plain dumb. It would require UTF16 for all strings, even internal. Couldn't make that stuff up if I tried.
Do you mean computed gotos? Because I already modified those macros, so that would be chosen if the compiler supports it.
No, computed gotos are present in both versions. I mean the whole way the vm operates in terms of how the arguments are passed around. The stack version had a stack, that OPs pushed arguments onto and popped from, now OPs work directly with registers. Also OPs used to be randomly sized, now they all are 4 bytes.
maintain hundreds of lines of boilerplate
I have to agree here. After a lot of thinking, I have to admit, with all my disgust towards C++, it will be a better option.
Tho, I'm still on edge of having the method code be defined directly in the class declaration... I need to do a lot of reading up on modern C++ tbh. And I feel like it would be better to start from scratch rather than doing a port over. By that I mean rewriting with a C++ oriented architecture in mind, instead of doing a Ctrl+F and replace.
And that means yet another rewrite... Dang it, this project is already 5+ years old with all the rewrites and so on and so forth, but maybe this will be the last one...
Maybe this is the push I need to get more heavy into programming, who knows, with all the bad weather coming, maybe its the spark I need. I will give it a shot. If you would like to join our forces, I welcome you, you've done an amazing job, and we both could really use each other's experience, I think. What's your timezone (I'm in GMT+3)? I would love to have a more real-time chat with you before I sit down and start piling stuff from 0 once again...
Cheers, Egor
now OPs work directly with registers. Also OPs used to be randomly sized, now they all are 4 bytes.
Ah, I just looked at the code, and I see what you mean. Might actually be pretty easy to port over.
Tho, I'm still on edge of having the method code be defined directly in the class declaration..,
That's just to order functions context-wise. I think it just makes more sense to define stuff where it is used.
By that I mean rewriting with a C++ oriented architecture in mind, instead of doing a Ctrl+F and replace.
That might not be necessary. Stuff like <TYPE>::make
(as opposed to using the default constructor) is regrettably how objects need to be created if memory allocation and garbage collection is handled by lowlevel malloc/calloc/realloc.
There are ways to implement custom allocators, but that would greatly increase complexity, and I see no good reason for that tbh.
I would love to have a more real-time chat with you before I sit down and start piling stuff from 0 once again...
I'm not actually sure if I will have that much free time, and at least for me, issues and discussions on github are plenty. Sorry about that...
Quick E: I looked at how the register-based vm is implemented, and it differs enough that it would be actually pretty difficult to port over. It would be basically a rewrite at that point - for now, the old vm works (-ish: md5.lit does not produce correct results, and i'm still not sure why).
Not sure what to do about this. Maybe use the C++ code as it is, and modify the relevant files (in this case, execvm.cpp
) to use a register-based VM?
I have to admit, the register-based vm was a huge rewrite of the whole code base as it is, and it pretty much means tearing out huge parts of the execution pipeline and rewriting them from 0.
So just modifying a port over will throw out >half of the code anyway, so there is no real reason to do it. I was hoping to get major performance increases from switching to register, but instead I've just gotten more robust bytecode standard, and that's about it. Wrapping my head around the implementation of the register bytecode emitter took a while.
but instead I've just gotten more robust bytecode standard, and that's about it.
That is a good thing to be sure, but it's not the most pressing issue atm. I think it might even make more sense to add JIT or AOT, which would implement a register anyway.
Buut... that's still in the future.
Personally I'm fine with the VM as it is, but there are clearly some problems that I can't figure out: the file md5.lit
is meant to test binary operators, and while it runs fine, it produces the wrong results.
There's also the problem that the compiler internally declares implicit variables (those not declared via var
) as globals.
The problem with that is, that it will implicitly override any local variables. They will behave much like a static declared variable in C.
I've tried to change that, but I'm not entirely sure where to start.
so tl;dr i'd focus on ironing out the runtime, just to make sure that any changes in the compiler/emitter aren't producing bogus instructions.
There's also the problem that the compiler internally declares implicit variables
You mean the ones used in for loops and stuff like that? It uses variable names that start with a space, so there is no way to access them/create something to replace it with, so it should be fine.
so tl;dr i'd focus on ironing out the runtime,
Well yes, but as there are two completely different runtimes at the moment, patching up one doesn't affect the other one. As much as I like how simple the stack based one is in terms of emitting code for it and stuff like that, I'm much more of a fan of the register one overall, since it is much more generalized and easy to extend later on.
You mean the ones used in for loops and stuff like that?
No, in general. Doing foo = "bar"
declares foo
as a global, accessible from the global scope (even if declared inside a scope, like functions, etc):
function blah()
{
foo = "bar"
}
blah()
println($"foo={foo}")
will print
foo=bar
even though foo
was declared inside blah
. This is true for any variable not declared with var
, and is quite a bit of a headache atm, as it introduces some pretty obvious issues.
Merely modifying the emitter to produce (GET|SET)LOCAL causes code to likewise behave strangely, but I haven't yet had time to properly investigate it.
Well yes, but as there are two completely different runtimes at the moment, patching up one doesn't affect the other one
I agree that a register-based VM would be better, but as you had already pointed out, it needs quite a major code overhaul. Hence why I'd put the idea in the backseat for now.
As I can't quite resolve the as-mentioned-before bug, there's currently no real room (or reason) to continue.
Maybe someone else will have a "eureka" moment about this.